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1. REAL PARTY IN INTEREST 

The real party in interest is Genentech, Inc., South San Francisco, California, by an 
assignment of the patent application U.S. Serial No. 09/918,585 recorded July 30, 2001, at Reel 
01 2095 and Frame 0677. 

2. RELATED APPEALS AND INTERFERENCES 

There are no related appeals or interferences known to Appellants, Appellants' legal 
representative, or Appellants' assignee that will directly affect or be directly affected by or have a 
bearing on the Board's decision in the present appeal. 

3. STATUS OF CLAIMS 

Claims 58-62 are in this application. 
Claims 1-57 and 63 are canceled. 

Claims 58-62 stand rejected and Appellants appeal the rejection of these claims. 
A copy of the rejected claims involved in the present Appeal is provided in the Claims 
Appendix. 

4. STATUS OF AMENDMENTS 

There were no amendments to the claims submitted after final rejection. All previous 
amendments to the claims have been entered. 

5. SUMMARY OF CLAIMED SUBJECT MATTER 

The invention claimed in the present application concerns an isolated antibody that 
specifically binds to the polypeptide of SEQ ID NO: 132 (Claim 58). The invention further 
provides monoclonal antibodies (Claim 59), humanized antibodies (Claim 60), antibody 
fragments (Claim 61), and labeled antibodies (Claim 62) that specifically bind to the polypeptide 
of SEQ ID NO: 132. 

Support for the preparation and uses of antibodies is found throughout the specification, 
including, for example, pages 217-225. The preparation of antibodies is described in Example 
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104, while Example 106 describes the use of the antibodies for purifying the polypeptides to 
which they bind. Isolated antibodies are defined in the specification at page 132, lines 29-38. 
Support for monoclonal antibodies is found in the specification at, for example, page 217, line 
30, to page 219, line 11, and Example 104. Support for humanized antibodies is found in the 
specification at, for example, page 219, line 12, to page 220, line 14. Support for antibody 
fragments is found in the specification at, for example, page 131, line 29, to page 132, line 22, 
and page 221, lines 6-34. Support for labeled antibodies is found in the specification at, for 
example, page 133, lines 1-4, and page 224, line 35, to page 225, line 4. 

The polypeptide of SEQ ID NO: 132 is designated PR0351, and its amino acid sequence 
is shown in Figure 49, while the encoding nucleic acid sequence (SEQ ID NO: 13 1) is shown in 
Figure 48. Page 112 line 37 through page 113, line 2 of the specification provides the description 
for Figures 48 and 49. The specification discloses that various portions of the PR0351 
polypeptide possess significant sequence similarity to the serine protease prostasin (see, for 
example, page 12, lines 21-33). The isolation of cDNA clones encoding PR0351 of SEQ ID 
NO: 7 is described in Example 22. Examples 100-103 describe the expression of PRO 
polypeptides in various host cells, including E. coli, mammalian cells, yeast and Baculovirus- 
infected insect cells. Finally, Example 1 14, in the specification at page 331, line 23, to page 346, 
line 4, sets forth a Gene Amplification assay which shows that the PR0351 gene is amplified in 
the genome of certain human lung cancers (see Table 9). 

The specification discloses that antibodies to PRO polypeptides may be used, for 
example, in purification of PRO (page 225, lines 5-1 1 and Example 106), in diagnostic assays for 
PRO expression (page 190, lines 3-9, and page 224, line 21 to page 225, line 4), as antagonists to 
PRO (page 198, lines 3-6), and as elements of pharmaceutical compositions for the treatment of 
various disorders (page 223, line 30, to page 224, line 28). 

6. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

I. Whether Claims 58-62 satisfy the utility requirement of 35 U.S.C. §101 . 

II. Whether Claims 58-62 satisfy the enablement requirement of 35 U.S.C. §112, first 
paragraph. 
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7. ARGUMENT 
Summary of the Arguments: 

Issue I: Utility 

Patentable utility of the PR0351 polypeptide and the antibodies which bind it is based 
upon the gene amplification data for the gene encoding the PR035 1 polypeptide. The 
specification discloses that the gene encoding PR0351 showed significant amplification, ranging 
from 2.03 to 2.75- fold , in ten different lung primary tumors . The Declaration of Dr. Audrey 
Goddard, submitted with Appellants' Response filed April 29, 2004, explains that a gene 
identified as being amplified at least 2-fold by the disclosed gene amplification assay in a tumor 
sample relative to a normal sample is useful as a marker for the diagnosis of cancer , for 
monitoring cancer development and/or for measuring the efficacy of cancer therapy. 
Accordingly, the Examiner's assertion that "the specification provides data showing a very small 
increase in DNA copy number, approximately 2-fold, in a few tumor samples for PR0351" 
(Page 2 of the Advisory Action mailed October 14, 2004) is both factually and scientifically 
incorrect. By referring to the 2.0-fold to 3.1-fold amplification of the PR0274 gene in lung 
tumors as "very small," the Examiner ignores the teachings of an expert declaration without any 
basis, or without presenting any evidence to the contrary . 

The Examiner has asserted that the control sample used in the disclosed gene 
amplification experiments was not a proper control, stating that "the art does not consider pooled, 
unrelated DNA samples to be an appropriate control." (Page 5 of the Office Action mailed 
August 29, 2005). Appellants submit that the negative control taught in the specification was 
known in the art at the time of filing, and accepted as a true negative control as demonstrated by 
use in peer reviewed publications . 

The Examiner has asserted that "damaged, precancerous lung epithelium is often 
aneuploid," and stated that "[o]ne skilled in the art would not conclude that PR0351 is a 
diagnostic probe for lung cancer unless it is clear that PR0351 is amplified to a clearly greater 
extent in true lung tumor tissue relative to non-cancerous lung epithelium." (Page 4 of the Office 
Action mailed August 29, 2005). In support of this assertion the Examiner cited a reference by 
Hittelman. Hittelman actually shows that an increase in chromosome number or gene 
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amplification is associated not with normal tissues, but with cancerous, or pre-cancerous tissues , 
and therefore, an increase in chromosome number or gene amplification is a useful marker for a 
cancerous or pre-cancerous state. 

The Examiner has asserted that the disclosed gene amplification data does not establish a 
patentable utility for the PR0351 polypeptide or the claimed antibodies that bind it because 
allegedly "[t]he art demonstrates that gene amplification does not reliably correlate with 
increased mRNA transcript levels or increased polypeptide levels," citing references by Pennica 
et al and Konopka et al (Page 4 of the Office Action mailed February 4, 2004). Haynes et al 
was cited "as providing evidence that polypeptide expression levels cannot be accurately 
predicted from mRNA levels" (Page 3 of the Advisory Action mailed October 14, 2004). In 
further support of the alleged lack of correlation between mRNA levels and polypeptide levels, 
the Examiner has cited additional references by Hu et al, LaBaer, Chen et al, Lian et al, Fessler 
et al, Gygi et al and Greenbaum et al 

The Examiner's reference to the lack of necessary correlation or accurate prediction in some 
of the rejections (as Appellants will discuss in the detailed arguments) clearly shows that the 
Examiner applied an improper legal standard when making this rejection. The evidentiary 
standard to be used throughout ex parte examination in setting forth a rejection is a 
preponderance of the totality of the evidence under consideration. Thus, to overcome the 
presumption of truth that an assertion of utility by the Appellant enjoys, the Examiner must 
establish that it is more likely than not that one of ordinary skill in the art would doubt the truth 
of the statement of utility. Only after the Examiner has made a proper prima facie showing of 
lack of utility, does the burden of rebuttal shift to the Appellant. The references cited by the 
Examiner do not suffice to make a prima facie case that more likely than not no generalized 
correlation exists between gene (DNA) amplification and increased mRNA and polypeptide 
levels. 

In contrast, Appellants have submitted ample evidence to show that, in general, if a gene 
is amplified in cancer, it is more likely than not that the encoded protein will be expressed at an 
elevated level. First, the articles by Orntoft et al, Hyman et al, and Pollack et al (made of 
record in Appellants' Response filed April 29, 2004) collectively teach that in general gene 
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amplification increases mRNA expression . Second, the Declaration of Dr. Paul Polakis, 
principal investigator of the Tumor Antigen Project of Genentech, Inc., the assignee of the 
present application, shows that, in general there is a correlation between mRNA levels and 
polypeptide levels . Appellants further note that the sale of gene expression chips to measure 
mRNA levels is a highly successful business, with a company such as Affymetrix recording 
168.3 million dollars in sales of their GeneChip arrays in 2004. Clearly, the research community 
believes that the information obtained from these chips is useful (i.e., that it is more likely than 
not informative of the protein level). 

Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between DNA, mRNA, 
and polypeptide levels, these instances are exceptions rather than the rule . In the majority of 
amplified genes , as exemplified by Orntoft et al, Hyman et al. } Pollack et ai t and the Polakis 
Declaration, the teachings in the art overwhelmingly show that gene amplification influences 
gene expression at the mRNA and protein levels . Therefore, one of skill in the art would 
reasonably expect in this instance, based on the amplification data for the PR0351 gene, that the 
PR035 1 polypeptide is concomitantly overexpressed. Thus, the claimed antibodies that bind the 
PR0351 polypeptide have utility in the diagnosis of cancer. 

Even if there is no correlation between gene amplification and increased mRNA/protein 
expression, (which Appellants expressly do not concede), a polypeptide encoded by a gene that is 
amplified in cancer would still have a specific, substantial, and credible utility. As evidenced by 
the Ashkenazi Declaration and the teachings of Hanna and Mornin, simultaneous testing of gene 
amplification and gene product over-expression enables more accurate tumor classification , even 
if the gene-product, the protein, is not over-expressed. This leads to better determination of a 
suitable therapy for the tumor, as demonstrated by the real-world example of the breast cancer 
marker HER-2/neu. 

Accordingly, Appellants submit that when the. proper legal standard is applied, one 
should reach the conclusion that the present application discloses at least one patentable utility 
for the for the PR0351 polypeptide and the claimed antibodies which bind it. 
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Issue II: Enablement 

Claims 58-62 stand rejected under 35 U.S.C. §112, first paragraph, allegedly "since the 
claimed invention is not supported by either a credible, specific and substantial asserted utility or a 
well established utility for the reasons set forth above, one skilled in the art clearly would not know 
how to use the claimed invention." (Pages 2-3 of the Office Action mailed August 29, 2005). 

Appellants submit that, as discussed above, the PR035 1 polypeptide and the antibodies 
that bind it have utility in the diagnosis of cancer. Based on such a utility, one of skill in the art 
would know exactly how to use the claimed antibodies for diagnosis of cancer, without any 
undue experimentation. 

These arguments are all discussed in further detail below under the appropriate headings. 

ISSUE I: Claims 58-62 satisfy the utility requirement of 35 U.S.C. §101 

Claims 58-62 stand rejected under 35 U.S.C. §101 because allegedly "the claimed 
invention is not supported by either a credible, specific and substantial asserted utility or a well 
established utility." (Page 2 of the Office Action mailed August 29, 2005). 

Appellants submit, for the reasons set forth below, that the specification discloses at least 
one credible, substantial and specific asserted utility for the claimed antibodies that bind the 
PR0351 polypeptide. 

A. The Legal Standard for Utility 

According to 35 U.S.C. §101: 

Whoever invents or discovers any new and useful process, machine, manufacture, or 
composition of matter, or any new and useful improvement thereof, may obtain a 
patent therefor, subject to the conditions and requirements of this title. (Emphasis 
added.) 

In interpreting the utility requirement, in Brenner v. Manson 1 the Supreme Court held that 
the quid pro quo contemplated by the U.S. Constitution between the public interest and the 
interest of the inventors required that a patent Appellant disclose a "substantial utility" for his or 
her invention, i.e. a utility "where specific benefit exists in currently available form." The Court 

1 Brenner v. Manson, 383 U.S. 519, 148 U.S.P.Q. (BNA) 689 (1966). 

2 Id at 534, 148 U.S.P.Q. (BNA) at 695. 
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concluded that "a patent is not a hunting license. It is not a reward for the search, but 
compensation for its successful conclusion. A patent system must be related to the world of 
commerce rather than the realm of philosophy." 3 

Later, in Nelson v. Bowler, 4 the C.C.P.A. acknowledged that tests evidencing 
pharmacological activity of a compound may establish practical utility, even though they may not 
establish a specific therapeutic use. The court held that "since it is crucial to provide researchers 
with an incentive to disclose pharmaceutical activities in as many compounds as possible, we 
conclude adequate proof of any such activity constitutes a showing of practical utility." 5 

In Cross v. Iizuka, 6 the C.A.F.C. reaffirmed Nelson, and added that in vitro results might 
be sufficient to support practical utility, explaining that "in vitro testing, in general, is relatively 
less complex, less time consuming, and less expensive than in vivo testing. Moreover, in vitro 
results with the particular pharmacological activity are generally predictive of in vivo test results, 
i.e. there is a reasonable correlation there between." 7 The court perceived "No insurmountable 
difficulty" in finding that, under appropriate circumstances, "in vitro testing, may establish a 
practical utility." 8 

The case law has also clearly established that Appellants' statements of utility are usually 
sufficient, unless such statement of utility is unbelievable on its face. 9 The PTO has the initial 
burden to prove that Appellants' claims of usefulness are not believable on their face. 10 In 
general, an Appellant's assertion of utility creates a presumption of utility that will be sufficient 



- 3 Id. at 536, 148 U.S.P.Q. (BNA) at 696. 

4 Nelson v. Bowler, 626 F.2d 853, 206 U.S.P.Q. (BNA) 881 (C.C.P.A, 1980). 

5 Id at 856, 206 U.S.P.Q. (BNA) at 883. 

6 Cross v. Iizuka, 753 F.2d 1047, 224 U.S.P.Q. (BNA) 739 (Fed. Cir. 1985). 
. 7 Id at 1050, 224 U.S.P.Q. (BNA) at 747. 

* Id 

9 In re Gazave, 379 F.2d 973, 154 U.S.P.Q. (BNA) 92 (C.C.P.A. 1967). 

10 Ibid. 
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to satisfy the utility requirement of 35 U.S.C. §101, "unless there is a reason for one skilled in the 

11 12 

art to question the objective truth of the statement of utility or its scope." ' 

Compliance with 35 U.S.C. §101 is a question of fact. 13 The evidentiary standard to be 
used throughout ex parte examination in setting forth a rejection is a preponderance of the 
totality of the evidence under consideration. 14 Thus, to overcome the presumption of truth that 
an assertion of utility by the Appellant enjoys, the Examiner must establish that it is more likely 
than not that one of ordinary skill in the art would doubt the truth of the statement of utility. 
Only after the Examiner made a proper prima facie showing of lack of utility, does the burden of 
rebuttal shift to the Appellant. The issue will then be decided on the totality of evidence. 

The well established case law is clearly reflected in the Utility Examination Guidelines 
("Utility Guidelines") 15 , which acknowledge that an invention complies with the utility 
requirement of 35 U.S.C. §101, if it has at least one asserted "specific, substantial, and credible 
utility" or a "well-established utility." Under the Utility Guidelines, a utility is "specific" when it 
is particular to the subject matter claimed. For example, it is generally not enough to state that a 
nucleic acid is useful as a diagnostic without also identifying the conditions that are to be 
diagnosed. 

In explaining the "substantial utility" standard, M.P.E.P. §2107.01 cautions, however, 
that Office personnel must be careful not to interpret the phrase "immediate benefit to the public" 
or similar formulations used in certain court decisions to mean that products or services based on 
the claimed invention must be "currently available" to the public in order to satisfy the utility 
requirement. "Rather, any reasonable use that an Appellant has identified for the invention that 
can be viewed as providing a public benefit should be accepted as sufficient, at least with regard 

11 In re Langer, 503 F.2d 1380,1391, 183 U.S.P.Q. (BNA) 288, 297 (C.C.P.A. 1974). 

n See also In reJolles, 628 F.2d 1322, 206 U.S.P.Q. 885 (C.C.P.A. 1980); In re Irons, 340 F.2d 974, 144 
U.S.P.Q. 351 (1965); In re Sichert, 566 F.2d 1 154, 1 1 59, 196 U.S.P.Q. 209, 212-13 (C.C.P.A. 1977). 

13 Raytheon v. Roper, 724 F.2d 951, 956, 220 U.S.P.Q. (BNA) 592, 596 (Fed. Cir. 1983) cert, denied, 469 
US 835 (1984). 

14 In re Oetiker, 911 F.2d 1443, 1445, 24 U.S.P.Q.2d (BNA) 1443, 1444 (Fed. Cir. 1992). 

15 66 Fed. Reg. 1092 (2001). 
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to defining a 'substantial' utility." 16 Indeed, the Guidelines for Examination of Applications for 
Compliance With the Utility Requirement, 17 gives the following instruction to patent examiners: 
"If the Appellant has asserted that the claimed invention is useful for any particular practical 
purpose . . . and the assertion would be considered credible by a person of ordinary skill in the 
art, do not impose a rejection based on lack of utility." 

B. The Data and Documentary Evidence Supporting a Patentable Utility 

Appellants respectfully submit that Appellants rely on the gene amplification data for 
patentable utility of the claimed antibodies that bind the PR0351 polypeptide, and that the gene 
amplification data for the gene encoding the PR0351 polypeptide is clearly disclosed in the 
instant specification under Example 114. 

It was well known in the art at the time the invention was made that gene amplification is 
an essential mechanism for oncogene activation. The gene amplification assay is well-described 
in Example 1 14 of the present application. Example 1 14 discloses that the inventors isolated 
genomic DNA from a variety of primary cancers and cancer cell lines that are listed in Table 9, 
including primary lung and colon tumors of the type and stage indicated in Table 8. As a 
negative control, DNA was isolated from the cells of ten normal healthy individuals, which was 
pooled and used as a control. Gene amplification was monitored using real-time quantitative 
TaqMan™ PCR. Table 9 shows the resulting gene amplification data. Further, Example 1 14 
explains that the results of TaqMan™ PCR are reported in ACt units, wherein one unit 
corresponds to one PCR cycle or approximately a 2-fold amplification relative to control, two 
units correspond to 4-fold amplification, 3 units to 8-fold amplification etc. 

Appellants respectfully submit that a ACt value of at least 1 .0, which is a more than 2- 
fold increase, was observed for PR0351 in at least ten of the lung tumors listed in Table 9. 
Table 9 teaches that the nucleic acids encoding PR0351 showed 1.02 to 1.46 ACt units which 
corresponds to 2 1 02 to 2 1 46 - fold amplification or 2.03 to 2.75 amplification in ten types of 
human primary lung tumors, LT9, LT10, LT11, LT13, LT15, LT16, LT17, LT18, LT19 and 

16 M.P.E.P. §2107.01. 

17 M.P.E.P. §2107 11(B)(1). 
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LT21. Accordingly, the present specification clearly discloses strong evidence that the gene 
encoding the PR0351 polypeptide is significantly amplified in a significant number of lung 
tumors. 

It is also well known that gene amplification occurs in most solid tumors, and generally is 
associated with poor prognosis. 

In support, Appellants have submitted, in their Response filed May 20, 2005, a 

Declaration by Dr. Audrey Goddard. Appellants particularly draw the Board's attention to page 

3 of the Goddard Declaration which clearly states that: 

It is further my considered scientific opinion that an at least 2-fold increase in 
gene copy number in a tumor tissue sample relative to a normal {i.e., non-tumor) 
sample is significant and useful in that the detected increase in gene copy 
number in the tumor sample relative to the normal sample serves as a basis for 
using relative gene copy number as quantitated by the TaqMan PCR technique 
as a diagnostic marker for the presence or absence of tumor in a tissue sample of 
unknown pathology. Accordingly, a gene identified as being amplified at least 
2-fold by the quantitative TaqMan PCR assay in a tumor sample relative to a 
normal sample is useful as a marker for the diagnosis of cancer, for 
monitoring cancer development and/or for measuring the efficacy of cancer 
therapy. (Emphasis added). 

As indicated above, the gene encoding the PR035 1 polypeptide shows significantly 
higher than a two fold amplification in ten different lung tumors. In addition, the Goddard 
Declaration clearly establishes that the TaqMan real-time PCR method described in Example 1 14 
has gained wide recognition for its versatility, sensitivity and accuracy, and is in extensive use for 
the study of gene amplification. The facts disclosed in the Declaration also confirm that based 
upon the gene amplification results, one of ordinary skill would find it credible that PR0351 is a 
diagnostic marker of lung cancer. 

The Examiner has asserted that "the specification provides data showing a very small 
increase in DNA copy number, approximately 2-fold, in a few tumor samples for PR035 1 ." 
(Page 2 of the Advisory Action mailed October 14, 2004). The Examiner further asserts that "it 
was imperative to find evidence in the relevant scientific literature whether or not a small 
increase in DNA copy number would be considered by the skilled artisan to be predictive of 
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increased mRNA and polypeptide levels." (Page 3 of the Advisory Action mailed October 14, 
2004). 

Appellants respectfully submit that the Examiner seems to be applying a heightened 

utility standard in this instance, which is legally incorrect. Appellants have shown that the gene 

encoding PR0351 demonstrated significant amplification, from 2.03 to 2.75 fold , in three lung 

tumors. As explained in the Declaration of Dr. Audrey Goddard (submitted with the Response 

filed May 20, 2005): 

It is further my considered scientific opinion that an at least 2-fold increase in 
gene copy number in a tumor tissue sample relative to a normal {i.e., non-tumor) 
sample is significant and useful in that the detected increase in gene copy 
number in the tumor sample relative to the normal sample serves as a basis for 
using relative gene copy number as quantitated by the TaqMan PCR technique 
as a diagnostic marker for the presence or absence of tumor in a tissue sample of 
unknown pathology. (Emphasis added). 

By referring to the 2.03-fold to 2.75-fold amplification of the PR0351 gene in lung 

tumors as "very small" the Examiner appears to ignore the teachings within an expert's 

declaration without any basis, or without presentinR any evidence to the contrary . Appellants 

respectfully draw the Board's attention to the Utility Examination Guidelines (Part IIB, 66 Fed. 

Reg. 1 098 (200 1 )) which state that: 

Office personnel must accept an opinion from a qualified expert that is based 
upon relevant facts whose accuracy is not being questioned; it is improper to 
disregard the opinion solely because of a disagreement over the significance or 
meaning of the facts offered. 

Thus, barring evidence to the contrary, Appellants maintain that the 2.03 to 2.75-fold 
amplification disclosed for the PR0351 gene is significant and forms the basis for the utility 
claimed herein. 

L The pooled normal blood sample is a valid negative control for gene 
amplification experiments 

The Examiner has noted that "Table 9 reports a comparison of lung tumor tissue samples 
with a pooled sample of DNA from normal cells but not matched tissue samples (i.e., normal 
lung epithelium tissue)" and has asserted that "it is not clear if Dr. Goddard intended the phrase 
'normal samples' to include unrelated tissue samples such as those used in the specification." 
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The Examiner concluded that "the art does not consider pooled, unrelated DNA samples to be an 
appropriate control." (Page 5 of the Office Action mailed August 29, 2005). 

Appellants respectfully submit that the negative control taught in the specification was 
known in the art at the time of filing, and accepted as a true negative control as demonstrated by 
use in peer reviewed publications . For example, Pennica et al, made of reference by the 
Examiner in the Office Action mailed February 24, 2004, explain that "[t]he relative WISP gene 
copy number in each colon tumor DNA was compared with pooled normal DNA from 10 
donors by quantitative PCR" (page 14720, col. 2; emphasis added). Pennica et al further explain 
that DNA was isolated from "the pooled blood of 10 normal human donors" (page 14718, col. 1). 
Thus Pennica et al used the same control for their gene amplification experiments as that 
described in the instant specification. 

In further examples, Pitti et al (Exhibit F submitted with the Response filed May 20, 
2005), used the same quantitative TaqMan PCR assay described in the specification to study 
gene amplification in lung and colon cancer of DcR3, a decoy receptor for Fas ligand. As 
described, Pitti et al analyzed DNA copy number 'in genomic DNA from 35 primary lung and 
colon tumours, relative to pooled genomic DNA from peripheral blood leukocytes (TBL) of 1 0 
healthy donors ." (Page 701, col. 1; emphasis added). The authors also analyzed mRNA 
expression of DcR3 in primary tumor tissue sections and found tumor-specific expression, 
confirming the finding of frequent amplification in tumors, and confirming that the pooled blood 
sample was a valid negative control for the gene amplification experiments. In Bieche et al 
(Exhibit G submitted with the Response filed May 20, 2005), the authors used the quantitative 
TaqMan PCR assay to study gene amplification of myc, ccndl and erbB2 in breast tumors. As 
their negative control, Bieche et al used normal leukocyte DNA derived from a small subset of 
the breast cancer patients (page 663). The authors note that "[t]he results of this study are 
consistent with those reported in the literature" (page 664, col. 2), thus confirming the validity of 
the negative control. Accordingly, the art demonstrates that pooled normal blood samples are 
considered to be a valid negative control for gene amplification experiments of the type described 
in the specification. 
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The Examiner has asserted that "both Pitti et al. and Bieche et al. did not rely solely upon 
the PCR assay using a control from blood genomic DNA to make conclusions." The Examiner 
notes in particular that "Pitti et al. also looked at northern blot analysis, ligand binding analysis, 
apoptosis induction analysis, and in situ hybridization analysis" (Page 2 of the Advisory Action 
mailed December 7, 2005). 

Appellants respectfully point out that northern blot analysis and in situ hybridization 
analysis were used to measure RNA levels, while ligand binding analysis and apoptosis induction 
analysis were used to measure the activity of the encoded protein. The sole technique that Pitti et 
al. relied upon to measure gene amplification was the PCR assay using the pooled blood control. 
While the Examiner notes that the authors used an additional control in the PCR assays, using 
flanking DNA regions in tumor samples compared to blood DNA samples, Appellants 
respectfully point out that because this assay also used the pooled normal blood sample as the 
comparison, it only supports Appellants' point that pooled normal blood sample is a valid 
control. 

The Examiner further asserted that "Bieche et al. relied upon Southern blotting to confirm 
the PCR results and note that not all samples showing PCR amplification also showed 
amplification by Southern blotting," adding that "[t]his was especially true for sequences that 
were amplified at low levels comparable to the levels that instant PR0351 was shown to be 
amplified." (Page 3 of the Advisory Action mailed December 7, 2005). Appellants respectfully 
direct the Board's attention to the paragraph immediately following that referenced by the 
Examiner, where the authors explain that "[g]ene amplification status has been studied mainly by 
means of Southern blotting, but this method is not sensitive enough to detect low-level gene 
amplification, nor accurate enough to quantify the full range of amplification values" (page 664, 
col. 1; emphasis added). Thus the fact that the samples showing lower levels of gene 
amplification by PCR analysis did not necessarily also show amplification by Southern blotting 
is precisely what would be expected given that the PCR analysis is a more sensitive technique, 
and can reliably detect levels of amplification missed by the less sensitive Southern blotting 
technique. The fact that, as Bieche et al. report, "[t]he results of this study are consistent with 
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those reported in the literature" (page 664, col. 2), confirms the validity of the pooled normal 
blood control used in the PCR experiments. 

Finally, the Examiner has noted that "publications have been cited as evidence that 
matched, cancer-free tissue samples are used as controls." (Page 3 of the Advisory Action 
mailed December 7, 2005). Appellants do not dispute that such matched, cancer-free samples 
may be used as controls. Appellants argue that pooled normal blood samples may also be used as 
an equally valid control, and have cited publications as evidence that such pooled normal blood 
samples are in fact used in the art as controls. 

2. Aneuploidy is associated with cancerous or precancerous tissues, not 

normal tissues 

The Examiner has asserted that "damaged, precancerous lung epithelium is often 
aneuploid," and stated that "[o]ne skilled in the art would not conclude that PR035 1 is a 
diagnostic probe for lung cancer unless it is clear that PR0351 is amplified to a clearly greater 
extent in true lung tumor tissue relative to non-cancerous lung epithelium." (Page 4 of the Office 
Action mailed August 29, 2005). In support of this assertion the Examiner cited a reference by 
Hittelman. 

Appellants note that the title of the Hittelman paper is "Genetic Instabilities in Epithelial 
Tissues at Risk for Cancer." Hittelman studied lung tissue from chronic smokers, which had 
been exposed for years to carcinogenic tobacco smoke. As Hittelman explains, "[t]umors of the 
aerodigestive tract have been proposed to reflect a 'field cancerization' process whereby the 
whole tissue is exposed to carcinogenic insult (e.g., tobacco smoke) and is at increased risk for 
multistep tumor development" (page 3). The detection of increases in chromosome number 
therefore identifies cells which have begun the first steps in this multistep progression to cancer. 
Even if these particular epithelial regions are not yet cancerous, their presence is strongly 
correlated with the development of cancer in the target tissue as a whole. Accordingly, 
Hittelman concludes that "the measurement of chromosome instability in the target tissue 
will be useful in assessing cancer risk as well as response to intervention" (page 10; emphasis 
added). 
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Accordingly, Hittelman shows that an increase in chromosome number or gene 
amplification is associated not with normal tissues, but with cancerous, or pre-cancerous tissues , 
and therefore, an increase in chromosome number or gene amplification is a useful marker for a 
cancerous or pre-cancerous state. Detection of pre-cancerous cells or tissues is useful because, as 
explained by Hittelman, it allows for assessing cancer risk, as well as response to intervention. 
Hence, Appellants respectfully submit that whether a pre-cancerous or tumor sample were 
analyzed, the showing of DNA amplification of the PR0351 gene would still be significant, 
since it would lead to the diagnosis of either a pre-cancerous state or a cancerous state, which is 
the utility asserted here. Despite the Examiner's assertion that such a use "is not well-established 
in the prior art," (page 4 of the Office Action mailed August 29, 2005) it is clear, as discussed 
above, that the use of amplified genes as markers for assessing cancer risk is explicitly 
contemplated in Hittelman et al Thus even if the DNA amplification observed for PR0351 was 
correlated to pre-cancerous rather than cancerous lung tissue, it would still provide utility for 
PR0351. 

C. A prima facie case of lack of utility has not been established 

The Examiner has asserted that the disclosed gene amplification data does not establish a 
patentable utility for the PR035 1 polypeptide or the claimed antibodies that bind it because 
allegedly "[t]he art demonstrates that gene amplification does not reliably correlate with 
increased mRNA transcript levels or increased polypeptide levels," citing references by Pennica 
et al and Konopka et al (Page 4 of the Office Action mailed February 24, 2004). Haynes et al 
was cited "as providing evidence that polypeptide expression levels cannot be accurately 
predicted from mRNA levels" (Page 3 of the Advisory Action mailed October 14, 2004). In 
further support of the alleged lack of correlation between mRNA levels and polypeptide levels, 
the Examiner has cited additional references by Hu et al, LaBaer, Chen et al, Lian et al, Fessler 
et al, Gygi et al and Greenbaum et al 

As a preliminary matter, Appellants respectfully submit that it is not a legal requirement 
to establish that gene amplification necessarily results in increased expression at the mRNA and 
polypeptide levels, or that protein levels can be "accurately predicted." As discussed above, the 
evidentiary standard to be used throughout ex parte examination of a patent application is a 

-16- 

On Appeal to the Board of Patent Appeals and Interferences 

Appellants 1 Brief 
Application Serial No. 09/978,295 
Attorney's Docket No. 39780-2630 P1C1 1 



preponderance of the totality of the evidence under consideration. Accordingly, Appellants 
submit that in order to overcome the presumption of truth that an assertion of utility by the 
Appellant enjoys, the Examiner must establish that it is more likely than not that one of 
ordinary skill in the art would doubt the truth of the statement of utility. Therefore, it is not 
legally required that there be a "necessary" correlation between the data presented and the 
claimed subject matter. The law requires only that one skilled in the art should accept that such a 
correlation is more likely than not to exist . Appellants respectfully submit that when the proper 
evidentiary standard is applied, a correlation must be acknowledged. 

1. Pennica et aL and Konopka et aL 

In support of the assertion that "[t]he art demonstrates that gene amplification does not 
reliably correlate with increased mRNA transcript levels or increased polypeptide levels," the 
Examiner has cited references by Pennica et al. and Konopka et al (Page 4 of the Office Action 
mailed February 24, 2004). In particular, the Examiner cited the abstract of Pennica et al for its 
disclosure that "WISP-1 gene amplification and overexpression in human colon tumors showed a 
correlation between DNA amplification and over-expression, whereas overexpression of WISP-3 
RNA was seen in the absence of DNA amplification. In contrast, WISP-2 DNA was amplified in 
colon tumors, but its mRNA expression was significantly reduced in the majority of tumors 
compared with expression in normal colonic mucosa from the same patient." From this, the 
Examiner correctly concluded that increased copy number does not necessarily result in 
increased polypeptide expression. The standard, however, is not absolute certainty . 

In fact, as noted even in Pennica et aL, "[a]n analysis of WISP A gene amplification and 
expression in human colon tumors showed a correlation between DNA amplification and over- 
expression.." (Pennica et a/., page 14722, left column, first full paragraph, emphasis added). 
Thus the findings of Pennica et al. with respect to WISP-1 support Appellants' arguments. In the 
case of WISP-3, the authors report that there was no change in the DNA copy number, but there 
was a change in mRNA levels. This apparent lack of correlation between DNA and mRNA 
levels is not contrary to Appellants' assertion that a change in DNA copy number generally leads 
to a change in mRNA level. Appellants are not attempting to predict the DNA copy number 
based on changes in mRNA level, and Appellants have not asserted that the only means for 
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changing the level of mRNA is to change the DNA copy number. Therefore a change in mRNA 
without a change in DNA copy number is not contrary to Appellants' assertions. 

The fact that the single WISP-2 gene did not show the expected correlation of gene 
amplification with the level of mRNA/protein expression does not establish that it is more likely 
than not, in general, that such correlation does not exist. The Examiner has not shown whether 
the lack or correlation observed for the WISP-2 gene is typical, or is merely a discrepancy, an 
exception to the rule of correlation . Indeed, the working hypothesis among those skilled in the 
art is that, if a gene is amplified in cancer, the encoded protein is likely to be expressed at an 
elevated level, as was demonstrated for WISP-1 . 

Accordingly, Appellants respectfully submit that Pennica et al teaches nothing 
conclusive regarding the absence of correlation between amplification of a gene and over- 
expression of the encoded WISP polypeptide. More importantly, the teaching of Pennica et al is 
specific to WISP genes. Pennica et al has no teaching whatsoever about the correlation of gene 
amplification and protein expression in general . 

The Examiner argues that Pennica et al is relevant even though it is limited to only one 
gene family because it is "evidence that one skilled in the art cannot assume that any one gene's 
amplification results in protein over-expression" and because the instant case also concerns a 
single gene. (Page 5 of the Office Action mailed August 29, 2005) Appellants disagree. The test 
is whether it is more likely than not that gene amplification results in overexpression of the 
corresponding mRNA and protein. In order to meet that standard, the Examiner must provide 
evidence that it is more likely than not that gene amplification does not result in mRNA or 
protein overexpression. Providing the single example of the WISP-2 gene does not suffice to 
meet this burden. 

Appellants next respectfully submit that, contrary to the PTO's assertions, Konopka et al 
supports Appellants' position that mRNA levels correlate with protein levels. Konopka et al 
states that "the 8-kb mRNA that encodes P210 c abl was detected at a 10-fold higher level in SK- 
CML7bt-333 ( Fig. 3A, +) than in SK-CML16BM (B, +), which correlated with the relative 
level of P210 c_abl detected in each cell line. Analysis of additional cell lines demonstrated that 
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the level of 8-kb mRNA directly correlated with the level of P2 1 0 c ' abl (Table 1 )" (page 4050, 
col. 2, emphasis added). 

Nor does Konopka et al support the PTO's position that DNA amplification is not 
correlated with mRNA or protein overexpression . Konopka et al. show only that, of the cell 
lines known to have increased abl protein expression, only one had amplification of the abl gene 
(page 4051, col. 1). This result proves only that increased mRNA and protein expression levels 
can result from causes other than gene amplification. Konopka et al. do not demonstrate that 
when gene amplification does occur, it does not result in increased mRNA and protein 
expression levels, particularly given that the cell line with amplification of the abl gene did show 
increased abl mRNA and protein expression levels. 

2. Hu et aL andLaBaer 

The Examiner refers to the reference by Hu et al. as allegedly demonstrating that it is 
incorrect that increased mRNA production leads to increased protein production. In particular, 
the Examiner cites Hu et al to the effect that genes displaying a 5-fold change or less in mRNA 
expression in tumors compared to normal showed no evidence of a correlation between altered 
gene expression and a known role in the disease. However, among genes with a 10-fold or more 
change in expression level, there was a strong and significant correlation between expression 
level and a published role in the disease. (Page 9 of the Office Action mailed August 19, 2005). 

Appellants submit that in order to overcome the presumption of truth that an assertion of 
utility by the Appellant enjoys, the Examiner must establish that it is more likely than not that 
one of ordinary skill in the art would doubt the truth of the statement of utility. Accordingly, 
contrary to the Examiner's assertion, Appellants submit that Hu et al does not show a lack of 
correlation between microarray data and the biological significance of cancer genes. 

Appellants respectfully point out that the analysis by Hu et al has certain statistical flaws. 
According to Hu et al, "different statistical methods 'were applied to' estimate the strength of 
gene-disease relationships and evaluated the results." (See page 406, left column, emphasis 
added). Using these different statistical methods, Hu et al "[assessed the relative strengths of 
gene-disease relationships based on the frequency of both co-citation and single citation." (See 
page 411, left column). It is well known in the art that various statistical methods allow different 
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variables to be manipulated to affect the outcome. For example, the authors admit, "Initial 
attempts to search the literature using" the list of genes, gene names, gene symbols, and 
frequently used synonyms, generated by the authors "revealed several sources of false positives 
and false negatives." (See page 406, right column). The authors further admit that the false 
positives caused by "duplicative and unrelated meanings for the term" were "difficult to 
manage." Therefore, in order to minimize such false positives, Hu et al disclose that these terms 
"had to be eliminated entirely, thereby reducing the false positive rate but unavoidably under- 
representing some genes. " (See page 406, right column). Hence, Appellants respectfully submit 
that in order to minimize the false positives and negatives in their analysis, Hu et al manipulated 
various aspects of the input data. 

Appellants further submit that the statistical analysis by Hu et al is not a reliable standard 
because the frequency of citation only reflects the current research interest in a molecule, not the 
true biological function of the molecule . Indeed, the authors acknowledge that "[Relationships 
established by frequency of co-citation do not necessarily represent a true biological link." (See 
page 41 1, right column). One would expect that genes with the greatest change in expression in 
a disease would be the first targets of research, and therefore have the strongest known 
relationship to the disease as measured by the number of publications reporting a connection with 
the disease. The correlation reported in Hu only indicates that the greater the change in 
expression level, the more likely it is that there is a published or known role for the gene in the 
disease, as found by their automated literature-mining software. Thus, Hu's results merely 
reflect a bias in the literature toward studying the most prominent targets, and say nothing 
regarding the ability of a gene that is 2-fold or more differentially expressed in tumors to serve as 
a disease marker. 

Even assuming that Hu et al provide evidence to support a true relationship, the 
conclusion in Hu et al only applies to a specific type of breast tumor (estrogen receptor 
(ER)-positive breast tumor) and can not be generalized as a principle governing microarray study 
of breast cancer in general, let alone the various other types of cancer genes in general . In fact, 
even Hu et al admit that, "[i]t is likely that this threshold will change depending on the disease 
as well as the experiment. Interestingly, the observed correlation was only found among 
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ER-positive (breast) tumors not ER-negative tumors." (See page 412, left column). Therefore, 
based on these findings, the authors add, "[t]his may reflect a bias in the literature to study the 
more prevalent type of tumor in the population. Furthermore, this emphasizes that caution must 
be taken when interpreting experiments that may contain subpopulations that behave very 
differently." (See page 4 12, left column; emphasis added). 

Furthermore, Hu et al. did not look for a correlation between changes in mRNA and 
changes in protein levels, and therefore their results are not contrary to Appellants' assertion that 
there is a correlation between the two. Appellants are not reiving on any "biological role" that 
the PRQ351 polypeptide has in cancer for its asserted utility . Instead, Appellants are relying on 
the overexpression of PR0351 in certain tumors compared to their normal tissue counterparts. 
Nowhere in Hu does it say that a lack of correlation in their study means that genes with a less 
than five-fold change in level of expression in cancer cannot serve as a diagnostic marker of 
cancer. 

The Examiner asserts that "Appellant is holding Hu et al. to a higher standard than their 
own specification" for statistical analysis. (Page 10 of the Office Action mailed August 29, 
2005). However, Appellants have compared the level of amplification of the PR035 1 gene in 
normal and lung tumors and have provided information indicating a greater than 2 fold 
amplification. Appellants are not relying on statistical analysis of information obtained from 
published literature based on the current research interest of a molecule, and hence the issues 
regarding statistical analysis of such information do not apply to Appellants' data. 

In the Advisory Action mailed December 7, 2005, the Examiner cited an additional article 
by LaBaer. Appellants respectfully point out that LaBaer is an unreviewed letter to the editor by 
an author of the Hu et al. article describing the automated literature searching tool used in the Hu 
et al reference discussed above. LaBaer provides no further evidence than that provided in Hu, 
and provides no evidence whatsoever to support the conclusion that the results of Hu are 
applicable to the diagnostic utility of differentially expressed genes. Importantly, like the Hu 
reference, LaBaer does not consider or offer any discussion of whether there is a correlation 
between changes in mRNA levels and changes in the level of the encoded protein. In addition, 
LaBaer' s conclusions regarding disease-independent differences between samples are not 
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applicable in the instant case Where normal human tissue and human tumor tissue samples were 
compared. Accordingly, LaBaer suffers from the same defects discussed above with respect to 
Hu et al 

3. Chen et al. 

In further support of the alleged lack of correlation between increased mRNA levels in 
cancer as compared to normal tissues and increased polypeptide levels, the Examiner cited Chen 
et al as allegedly disclosing that "only 17% of 165 polypeptide spots or 21% of the genes had a 
significant correlation between protein and mRNA expression levels in lung adenocarcinoma 
samples." (Page 6 of the Advisory Action mailed December 7, 2005). 

First, Appellants note that proteins selected for study by Chen et al were those detectable 
by staining of 2D gels. As noted in, for example, Haynes et al, made of record by the Examiner 
in the Advisory Action mailed December 7, 2005, there are problems with selecting proteins 
detectable by 2D gels. "It is apparent that without prior enrichment only a relatively small and 
highly selected population of long-lived, highly expressed proteins is observed. There are many 
more proteins in a given cell which are not visualized by such methods. Frequently it is the low 
abundance proteins that execute key regulatory functions." (page 1870, col. 1). Thus, Chen et 
al, by selecting proteins detectable by staining of 2D gels, are likely to have excluded from their 
analysis many of the proteins most likely to be significant as cancer markers. 

Secondly, Chen et al looked at expression levels across a set of samples including a large 
number of tumor samples (76) along with a much smaller number of normal samples (9). The 
tumor samples were taken from stage 1 and stage III lung adenocarcinomas, which were 
classified as bronchoaveolar, bronchial derived or both bronchial and bronchoaveolar derived. 
Accordingly, the tissues examined were from different tissues in different stages of normal or 
cancerous growth. The authors determined the relationship between mRNA and protein 
expression by using the average expression values for all samples . The average value for each 
protein or mRNA was generated using all 85 lung tissue samples. This resulted in negative 
normalized protein values in some cases. Further, the authors chose an arbitrary threshold of 
0.115 for the correlation to be considered significant. Accordingly, the Chen paper does not 
account for different expression in different tissues or different stages of cancer. 
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Thirdly, no attempt was made to compare expression levels in normal versus tumor 
samples , and in fact the authors concede that they had too few normal samples for meaningful 
analysis (page 310, col. 2). As a result, the analysis in the Chen paper shows only that a number 
of randomly selected proteins have varying degrees of correlation between mRNA and protein 
expression levels within a set of different lung adenocarcinoma samples. The Chen paper does 
not address the issue of whether increased mRNA levels in the tumor samples taken together as 
one group, as compared to the normal samples as a group, correlated with increased protein 
levels in tumorous versus normal tissue. Accordingly, the results presented in the Chen paper are 
not applicable to the application at issue. 

The correct test of utility is whether the utility is "more likely than not". In the case of 
the Chen reference, even if the analysis presented is correct (which is disputed), a review of the 
correlation coefficient data presented in the Chen et aL paper indicates that it is more likely than 
not that increased mRNA expression correlates with increased protein expression. A review of 
Table 1, which lists 66 genes [the paper incorrectly states there are 69 genes listed] for which 
only one protein isoform is expressed, shows that 40 genes out of 66 had a positive correlation 
between mRNA expression and protein expression. This clearly meets the test of "more likely 
than not". Similarly, in Table II , 30 genes with multiple isoforms [again the paper incorrectly 
states there are 29] were presented. In this case, for 22 genes out of 30, at least one isoform 
showed a positive correlation between mRNA expression and protein expression. Furthermore, 
12 genes out of 29 showed a strong positive correlation [as determined by the authors] for at least 
one isoform. No genes showed a significant negative correlation. It is not surprising that not all 
isoforms are positively correlated with mRNA expression. Certain isoforms are likely non- 
functional proteins. Thus, Table II also provides that it is more likely than not that protein levels 
will correlate with mRNA expression levels. 

4. Haynes et aL and Gygi et aL 
The Examiner has cited the Haynes reference to establish that "even if gene amplification 
correlates with increased transcription, it does not always follow that protein levels are also 
amplified." (Page 5 of the Office Action mailed February 24, 2004; emphasis added). The 
Examiner further states that "Haynes et al was cited as providing evidence that polypeptide 
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levels cannot be accurately predicted from mRNA levels." (Page 3 of the Advisory Action 
mailed October 14,2004). 

Appellants respectfully point out that Haynes et al never indicate that the correlation 
between mRNA and protein levels does not exist. Haynes et al only state that "protein levels 
cannot be accurately predicted from the level of the corresponding mRNA transcript" (See page 
1863, under Section 2.1, last line, emphasis added). This result is expected, since there are many 
factors that determine translation efficiency for a given transcript, or the half-life of the encoded 
protein. Not surprisingly, Haynes et al concluded that protein levels cannot always be accurately 
predicted from the level of the corresponding mRNA transcript in a single cellular stage or type 
when looking at the level of transcripts across different genes . 

Importantly, Haynes et al. did not say that for a single gene, a change in the level of 
mRNA transcript is not positively correlated with a change in the level of protein expression. 
Appellants have asserted that increasing the level of mRNA for a particular gene leads to a 
corresponding increase for the encoded protein. Haynes et al did not study this issue and says 
absolutely nothing about it. One cannot look at the level of mRNA across several different genes 
to investigate whether a change in the level of mRNA for a particular gene leads to a change in 
the level of protein for that gene. Therefore, Haynes et al is not inconsistent with or 
contradictory to the utility of the instant claims, and offers no support for the PTO's rejection of 
Appellants' asserted utility. 

Furthermore, Appellants note that contrary to the Examiner's statement, Haynes teaches 
that "there was a general trend but no strong correlation between protein [expression] and 
transcript levels" (See page 1863, under Section 2.1, emphasis added). For example, in Figure 1, 
there is a positive correlation between mRNA and protein amongst most of the 80 yeast proteins 
studied but the correlation is not linear, hence the authors suggest that one cannot accurately 
predict protein levels from mRNA levels. In fact, very few data points deviated or scattered 
away from the expected normal or showed a lack of correlation between mRNA: protein levels. 
Thus, the Haynes data meets the "more likely than not standard" and shows that a positive 
correlation exists between mRNA and protein. Therefore, Appellants submit that the Examiner's 
rejection is based on a misrepresentation of the scientific data presented in Haynes et al 
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Haynes et al may teach that protein levels cannot be "accurately predicted" from mRNA 
levels in the sense that the exact numerical amounts of protein present in a tissue cannot be 
determined based upon mRNA levels. Appellants respectfully submit that the PTO's emphasis 
on the need to "accurately predict" protein levels based on mRNA levels misses the point. The 
asserted utility for the claimed polypeptides is in the diagnosis of cancer. What is relevant to use 
as a cancer diagnostic is relative levels of gene or protein expression, not absolute values, that is, 
that the gene or protein is differentially expressed in tumors as compared to normal tissues. 
Appellants need only show that there is a correlation between mRNA and protein levels, such 
that mRNA overexpression generally predict protein overexpression. A showing that mRNA 
levels can be used to "accurately predict" the precise levels of protein expression is not required . 

In the Advisory Action mailed December 7, 2005, the Examiner cited Gygi et al, a study 
on which the Haynes references is based. Like Haynes, the Gygi reference looked at levels of 
mRNA at the same growth phase across different genes, not changes in mRNA levels for a single 
gene. Thus, when Gygi et al state that "the correlation between mRNA and protein levels was 
insufficient to predict protein expression levels from quantitative mRNA data," the authors are 
referring to correlations between constant levels of mRNA and protein at the same growth phase 
across different genes , not a correlation between a change in mRNA level and a change in protein 
level for the same gene and corresponding protein. Therefore, for the same reasons that Haynes 
is not relevant to Appellants' asserted utility, Gygi likewise offers no support for the PTO's 
rejection of Appellants' asserted utility. 

Furthermore, Appellants submit that Gygi et al too did not indicate that a correlation 
between mRNA and protein levels does not exist. Gygi et al only state that the correlation may 
not be sufficient in accurately predicting protein level from the level of the corresponding 
mRNA transcript (Emphasis added) (see page 1270, Abstract). Accurate prediction is not a 
criteria that is necessary for meeting the utility standards. Appellants note that the Gygi data 
indicate a general trend of correlation between protein [expression] and transcript levels 
(Emphasis added). For example, as shown in Figure 5, an mRNA abundance of 250-300 
copies /cell correlates with a protein abundance of 500-1000 x 10 3 copies/cell. An mRNA 
abundance of 100-200 copies/cell correlates with a protein abundance of 250-500 x 10 
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copies/cell (emphasis added). Therefore, high levels of mRNA generally correlate with high 
levels of proteins. In fact, most data points in Figure 5 did not deviate or scatter away from the 
general trend of correlation. Thus, the Gygi data meets the "more likely than not standard" and 
shows that a positive correlation exists between mRNA and protein. Therefore, Appellants 
submit that the Examiner's rejection is based on a misrepresentation of the scientific data 
presented in Gygi et al 

5. Lian et al. 

In further support of the alleged lack of correlation between mRNA expression and 
protein expression levels, the PTO has cited Lian et al for the statement that there is a poor 
correlation between mRNA expression and protein abundance in mouse cells, and therefore it 
may be difficult to extrapolate directly from individual mRNA changes to corresponding ones in 
protein levels. (Page 6 of the Office Action mailed August 29, 2005). 

In Lian et al, the authors looked at the mRNA and protein levels of genes in a derived 
promyelocyte mouse cell-line during differentiation of the cells from a promyelocyte stage of 
development to mature neutrophils following treatment with retinoic acid. The level of mRNA 
expression was measured using 3 '-end differential display (DD) and oligonucleotide chip array 
hybridization to examine the expression of genes at 0, 24, 48 and 72 hours after treatment with 
retinoic acid. Protein levels were qualitatively assessed at 0 and 72 hours after retinoic acid 
treatment following 2-dimensional gel electrophoresis. 

Lian et al report that they were able to identify 28 proteins which they considered 
differentially expressed (page 521). Of those 28, only 18 had corresponding gene expression 
information, and only 13 had measurable levels of mRNA expression (page 521, Table 6). The 
authors then compared the qualitative protein level from the 2-D electrophoresis gel to the 
corresponding mRNA level, and reported that only 4 genes of the 18 present in the database had 
expression levels which were consistent with protein levels (page 521, col. 1). The authors note 
that "[n]one of these was on the list of genes that were differentially expressed significantly (5z 
fold or greater change by array or 2-fold or greater change by DP )" (page 52 1 ; emphasis added). 
Based on these data, the authors conclude u [f]or protein levels based on estimated intensity of 
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Coomassie dye staining in 2DE, there was poor correlation between changes in mRNA levels and 
estimated protein levels" (page 522, col. 2). 

The authors themselves admit that there are a number of problems with the data presented 
in this reference. At page 520 of this article, the authors explicitly express their concerns by 
stating that " |Y|hese data must be considered with several caveats: membrane and other 
hydrophobic proteins and very basic proteins are not well displayed by the standard 2DE 
approach, and proteins presented at low level will be missed. In addition, to simplify MS 
analysis, we used a Coomassie dye stain rather than silver to visualize proteins, and this 
decreased the sensitivity of detection of minor proteins. " (emphasis added). It is known in the art 
that Coomassie dye stain is a very insensitive method of measuring protein. This suggests that 
the authors relied on a very insensitive measurement of the proteins studied. The conclusions 
based on such measurements can hardly be accurate or generally applicable. In particular, the 
total number of proteins examined by Lian et ah was only 50 (page 520, col. 2), as compared to 
the approximately 7000 genes for which mRNA levels were measured (page 515, col. 1). Thus 
the conclusions are based on a very small and atypical set of proteins . 

Appellants also emphasize that Appellants are asserting that a measurable change in 
mRNA level generally leads to a corresponding change in the level of protein expression, not that 
changes in protein level can be used to predict changes in mRNA level As discussed above, 
Lian et ah did not take genes which showed significant mRNA changes and check the 
corresponding protein levels . Instead, the authors looked at a small and unrepresentative number 
of proteins, and checked the corresponding mRNA levels. Based on the authors' criteria, mRNA 
levels were significantly changed if they were at least 5-fold different when measured using a 
microchip array, or 2-fold different when using the more sensitive 3 '-end differential display 
(DD). Of the 28 proteins listed in Table 6, only one has an mRNA level measured by microarray 
which is differentially expressed according to the authors (spot 7: melanoma X-actin, for which 
mRNA changed from 2539 to 341.3, and protein changed from 1 to 3). None of the other 
mRNAs listed in Table 6 show a significant change in expression level when using the criteria 
established by the authors for the less sensitive microarray technique. 
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There is also one gene in Table 6 whose expression was measured by the more sensitive 
technique of DD, and its level increased from a qualitative value of 0 to 2, a more than 2-fold 
increase (spot 2: actin, gamma, cytoplasmic). This increase in mRNA was accompanied by a 
corresponding increase in protein level, from 3 to 6. 

Therefore, although the authors characterize the mRNA and protein levels as having a 
"poor correlation," this does not reflect a lack of a correlation between a change in mRNA level 
and a corresponding change in protein level. Only two genes meet the authors' criteria for 
differentially expressed mRNA level, and of those, one apparently shows a corresponding change 
in protein level and one does not. Thus, there is little basis for the authors' conclusion that "it 
may be difficult to extrapolate directly from individual mRNA changes to corresponding ones in 
protein levels (as estimated from 2DE)." 

Finally, Appellants submit that Lian et al only teach that protein expression may not 
correlate with mRNA level in differentiating myeloid cells and does not teach anything regarding 
such a lack of correlation for genes in general Myeloid cell differentiation relates to 
hematopoiesis and is an entirely different biological process from solid tumor development 
because these two process involve entirely different regulatory mechanisms and molecules. 
Analysis of surface antigens expressed on myeloid cells of the granulocyte-monocyte-histiocyte 
series during differentiation in normal and malignant myelomonocytic cells is useful in 
identifying and classifying human leukemias and lymphomas, but cannot be used in diagnosis of 
any solid tumors. Therefore, even if the teaching of Lian et al accurately reflects the correlation 
between mRNA and protein for the particular system studied, it can not apply to the tumor 
diagnosis assays of the present application. 

6. Fessler et al. 

In further support of the alleged lack of correlation between mRNA expression and 
protein expression levels, the PTO has also cited a publication by Fessler et al, as having "found 
a 'poor concordance between mRNA transcript and protein expression changes' in human cells." 
(Page 6 of the Office Action mailed August 29, 2005). Fessler is not contrary to Appellants' 
asserted utility, and actually supports Appellants' assertion that a change in the level of mRNA 
for a particular protein generally leads to a corresponding change in the level of the encoded 

-28- 

On Appeal to the Board of Patent Appeals and Interferences 

Appellants 9 Brief 
Application Serial No. 09/978,295 
Attorney's Docket No. 39780-2630 PI CI 1 



protein. As noted above, Appellants make no assertions regarding changes in protein levels 
when mRNA levels are unchanged, nor does evidence of changes in protein levels when mRNA 
levels are unchanged have any relevance to Appellants' asserted utility. 

Fessler et al. studied changes in neutrophil (PMN) gene transcription and protein 
expression following lipoplysaccharide (LPS) exposure. In Table VIII, Fessler et al list a 
comparison of the change in the level of mRNA for 13 up-regulated proteins and 5 down- 
regulated proteins. Of the 13 up-regulated proteins, a change in mRNA levels is reported for 
only 3 such proteins. For these 3, mRNA levels are increased in 2 and decreased in the third. Of 
the 5 down-regulated proteins, a change in mRNA is reported for 3 such proteins. In all 3, 
mRNA levels also are decreased. Thus, in 5 of the 6 cases for which a change in mRNA levels 
are reported, the change in the level of mRNA corresponds to the change in the level of the 
protein . This is consistent with Appellants' assertion that a change in the level of mRNA for a 
particular protein generally leads to a corresponding change in the level of the encoded protein. 

Regarding the remainder of the proteins listed in Table VIII, in 6 instances, protein levels 
changed while mRNA levels were unchanged. This evidence has no relevance to Appellants' 
assertion that changes in mRNA levels lead to corresponding changes in protein levels, since 
Appellants are not asserting that changes in mRNA levels are the only cause of changes in 
protein levels. In the final 6 instances listed in Table VIII, protein levels changed while mRNA 
was noted as "absent." This evidence also has no relevance to Appellants' assertion that changes 
in mRNA levels causes corresponding changes in protein levels. By virtue of being "absent," it 
is not possible to tell whether mRNA levels were increased, decreased or remained unchanged in 
PMN upon contact with LPS. Nothing in these results by Fessler et al suggests that a change in 
the level of mRNA for a particular protein does not generally lead to a corresponding change in 
the level of the encoded protein. Accordingly, these results are not contrary to Appellants' 
assertions. 

The PTO points to Fessler' s statement regarding Table VIII that there was "a poor 
concordance between mRNA transcript and protein expression changes." (Page 6 of the Office 
Action mailed August 29, 2005). As is clear from the above discussion, this statement does not 
relate to a lack of correlation between a change in mRNA levels leading to a change in protein 
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levels, because in 5 of 6 such instances, changes in mRNA and protein levels correlated well . 
Instead, this statement relates to observations in which protein levels changed when mRNA was 
either unchanged or "absent." As such, this statement is an observation that in addition to 
transcriptional activity, LPS also has post-transcriptional and possibly post-translational activity 
that affect protein levels, an observation which is not contrary to Appellants' assertions. 
Accordingly, Fessler's results are consistent with Appellants' assertion that a change in mRNA 
level of for a particular protein generally leads to a corresponding change in the level of the 
encoded protein, since 5 of 6 genes demonstrated such a correlation. 

7. Greenbaum et al. 

In further support of the alleged lack of correlation between mRNA expression and 
protein expression levels, the Examiner cited new reference by Greenbaum et al. The Examiner 
asserted that Greenbaum et al. teaches that, "To date, there have been only a handful of efforts to 
find correlations between mRNA and protein expression levels. And, for the most part, they 
have reported only minimal and/or limited correlations." (Page 4 of the Advisory Action mailed 
December 7 ? 2005). 

Appellants note that Greenbaum et al. compared the expression of a number of different 
mRNAs and their corresponding proteins in yeast cells. Greenbaum et al. did not compare the 
change of expression of specific mRNAs and their corresponding proteins in cancer cells versus 
normal cells. Accordingly, this reference is also not relevant to the issue at hand. Nevertheless, 
Greenbaum states that logically "we would assume that those ORFs that show a large degree of 
variation in their expression are controlled at the transcriptional level. The variability of the 
mRNA expression is indicative of the cell controlling the mRNA expression at different points of 
the cell cycle to achieve the resulting and desired protein. Thus we would expect and we found 
a high degree of correlation (r-0.89) between the reference mRNA and protein levels for 
these particular ORFs: the cell has already put significant energy into dictating the final 
level of protein through tightly controlling the mRNA expression" (page 1 17.5, col. 1; 
emphasis added). Furthermore, Greenbaum states that "we found that ORFs that have higher 
than average levels of ribosomal occupancy - that is that a large percentage of their cellular 
mRNA concentration is associated with ribosomes (being translated) - have well correlated 
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mRNA and protein expression levels. (Figure 2)." (page 1 17.5, col. 2; emphasis added). 
Therefore, contrary to the Examiner's assertion, Greenbaum does find high levels of correlation 
between mRNA and protein expression in yeast cells. In particular, Greenbaum demonstrates 
that a high degree of correlation is found for those genes which show a large degree of variability 
in mRNA expression - that is, for those genes which show changes in mRNA expression, the 
change in mRNA expression is correlated with a change in protein expression. 

In summary, Appellants respectfully submit that the Examiner has not shown that gene 
amplification in tumor as compared to normal tissue is not correlated with changes in mRNA and 
protein expression. The Patent Office has failed to meet its initial burden of proof that 
Appellants' claims of utility are not substantial or credible. The arguments presented by the 
Examiner in combination with the Pennica, Konopka, Hu, LaBaer, Chen, Haynes, Gygi, Lian, 
Fessler, and Greenbaum articles do not provide sufficient reasons to doubt the statements by 
Appellants that PR0351 has utility. As discussed above, the law does not require the existence 
of a "necessary" correlation between mRNA and protein levels. Nor does the law require that 
protein levels be "accurately predicted." According to the authors themselves, the data in the 
above cited references confirm that there is a general trend between protein expression and 
transcript levels, which meets the "more likely than not standard" and show that a positive 
correlation exists between mRNA and protein. Therefore, Appellants submit that the Examiner's 
reasoning is based on a misrepresentation of the scientific data presented in the above cited 
reference and application of an improper, heightened legal standard. In fact, contrary to what the 
Examiner contends, the art indicates that, if a gene is overexpressed in cancer, it is more likely 
than not that the encoded protein will also be expressed at an elevated level. 

D. It is "more likely than not" for amplified genes to have increased mRNA and 
protein levels 

Appellants have submitted ample evidence to show that, in general, if a gene is amplified 
in cancer, it is more likely than not that the encoded protein will be expressed at an elevated 
level. First, the articles by Orntoft et ai 9 Hyman et al. 9 and Pollack et aL, (made of record in 
Appellants' Response filed July 28, 2004) collectively teach that in general gene amplification 
increases mRNA expression . Second, the Declaration of Dr. Paul Polakis, principal investigator 
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of the Tumor Antigen Project of Genentech, Inc., the assignee of the present application, shows 
that, in general, there is a correlation between mRNA levels and polypeptide levels . Thus, taken 
together, all of the submitted evidence supports Appellants' position that gene amplification is 
more likely than not predictive of increased mRNA and polypeptide levels. 

Appellants submit that there are numerous articles which show that generally, if a gene is 
amplified in cancer, it is more likely than not that the mRNA transcript will be expressed at an 
elevated level. For example, Orntoft et al (Mol. and Cell. Proteomics, 2002, vol. 1, pages 37-45 
made of record in Appellants' Response filed July 28, 2004) studied transcript levels of 5600 
genes in malignant bladder cancers, many of which were linked to the gain or loss of 
chromosomal material using an array-based method. Orntoft et al. showed that there was a gene 
dosage effect and taught that "in general (18 of 23 cases) chromosomal areas with more than 2- 
fold gain of DNA showed a corresponding increase in mRNA transcripts" (see column 1, 
abstract). In addition, Hyman et al. (Cancer Res,, 2002, vol. 62, pages 6240-45 - made of record 
in Appellants' Response filed July 28, 2004) showed, using CGH analysis and cDNA 
microarrays which compared DNA copy numbers and mRNA expression of over 1 2,000 genes in 
breast cancer tumors and cell lines, that there was "evidence of a prominent global influence of 
copy number changes on gene expression levels." (See page 6244, column 1, last paragraph). 
Additional supportive teachings were also provided by Pollack et al., (PNAS, 2002, vol. 99, 
pages 12963-12968 - made of record in Appellants' Response filed July 28, 2004) who studied a 
series of primary human breast tumors and showed that ". . .62% of highly amplified genes show 
moderately or highly elevated expression, and DNA copy number influences gene expression 
across a wide range of DNA copy number alterations (deletion, low-, mid- and high-level 
amplification), and that on average, a 2-fold change in DNA copy number is associated with a 
corresponding 1.5-fold change in mRNA levels." Thus, these articles collectively teach that in 
general, gene amplification increases mRNA expression . 

In addition, in their Response filed July 28, 2004, Appellants submitted a Declaration by 
Dr. Polakis, principal investigator of the Tumor Antigen Project of Genentech, Inc., the assignee 
of the present application, to show that mRNA expression correlates well with protein levels, in 
general. As Dr. Polakis explains, the primary focus of the microarray project was to identify 
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tumor cell markers useful as targets for both the diagnosis and treatment of cancer in humans. 
The scientists working on the project extensively rely on results of microarray experiments in 
their effort to identify such markers. As Dr. Polakis explains, using microarray analysis, 
Genentech scientists have identified approximately 200 gene transcripts (mRNAs) that are 
present in human tumor cells at significantly higher levels than in corresponding normal human 
cells. To the date of the Declaration, they have generated antibodies that bind to about 30 of the 
tumor antigen proteins expressed from these differentially expressed gene transcripts and have 
used these antibodies to quantitatively determine the level of production of these tumor antigen 
proteins in both human cancer cells and corresponding normal cells. Having compared the levels 
of mRNA and protein in both the tumor and normal cells analyzed, they found a very good 
correlation between mRNA and corresponding protein levels. Specifically, in approximately 
80% of their observations they have found that increases in the level of a particular mRNA 
correlates with changes in the level of protein expressed from that mRNA. While the proper 
legal standard is to show that the existence of correlation between mRNA and polypeptide levels 
is more likely than not, the showing of approximately 80% correlation for the molecules tested 
according to the Polakis Declaration greatly exceeds this legal standard . Based on these 
experimental data and his vast scientific experience of more than 20 years, Dr. Polakis states that, 
for human genes, increased mRNA levels typically correlate with an increase in abundance of the 
encoded protein. He further confirms that "it remains a central dogma in molecular biology that 
increased mRNA levels are predictive of corresponding increased levels of the encoded protein." 

Appellants further note that the sale of gene expression chips to measure mRNA levels is 
a highly successful business, with a company such as Affymetrix recording 168.3 million dollars 
in sales of their GeneChip arrays in 2004. Clearly, the research community believes that the 
information obtained from these chips is useful (i.e., that it is more likely than not informative of 
the protein level). 

In the Advisory Action mailed October 14, 2004, the Examiner asserted that "Orntoft et 
al do not appear to look at gene amplification, mRNA levels and polypeptide levels from a 
single gene at a time. . . . Orntoft et al. concentrated on regions of chromosomes with strong gains 
of chromosomal material containing clusters of genes (p.40). This analysis was not done for 
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PR0351 in the instant specification. That is, it is not clear whether or not PR0351 is in a gene 
cluster in a region of a chromosome that is highly amplified. Therefore, the relevance of Orntoft 
et al. is not clear." (Pages 4-5 of the Advisory Action mailed October 14, 2004). The Examiner 
further asserted that "Hyman et al. used the same CGH approach in their research. Less than half 
(44%) of highly amplified genes showed mRNA overexpression (abstract).... Therefore, Hyman 
et al also do not support utility of the polypeptides of the instant invention." (Page 5 of the 
Advisory Action mailed October 14, 2004). The Examiner further asserted that "Pollack et al, 
also used CGH technology, concentrating on large chromosome regions showing high 
amplification (p. 12965). Pollack et al. did not investigate polypeptide levels. Therefore, 
Pollack et al. also do not support the asserted utility of the claimed invention." (Page 5 of the 
Advisory Action mailed October 14, 2004). 

In Orntoft et al, 1,800 genes that yielded an increase or decrease in mRNA expression in 
two invasive tumors compared to the two non-invasive papillomas were then mapped to 
chromosomal locations. The chromosomes had already been analyzed for amplification by 
hybridizing tumor DNA to normal metaphase chromosomes (CGH). Orntoft et al used CGH 
alterations as the independent variable and estimated the frequency of expression alterations of 
the 1,800 genes in the chromosomal areas. Orntoft et al found that in general (77% and 80% 
concordance) areas with a strong gain of chromosomal material contained a cluster of genes 
having increased mRNA expression (see page 40). Orntoft et al. state, "For both tumors 
TCC733 (p<0.015) and TCC827 (p<0.00003) a highly significant correlation was observed 
between the level of CGH ratio change (reflecting the DNA copy number) and alterations 
detected by the array based technology" (see page 41, column 1). Orntoft et al. also studied the 
relationship between altered mRNA and protein levels using 2D-PAGE analysis. Orntoft et al. 
state, "In general there was a highly significant correlation (p<0.005) between mRNA and protein 
alterations.. . . 26 well focused proteins whose genes had a known chromosomal location were 
detected in TCCs 733 and 335, and of these 19 correlated (p<0.005) with the mRNA changes 
detected using the arrays." (See page 42, column 2 to page 34, column 2). Accordingly, Orntoft 
et al clearly support Appellants' position that proteins expressed by genes that are amplified in 
tumors are useful as cancer markers. 
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The Examiner indicates that Appellants have not indicated whether PR0351 is in a gene 
cluster region of a chromosome. (Page 5 of the Advisory Action mailed October 14, 2004). 
Appellants fail to see how this is relevant to the analysis. Orntoft et al did not limit their 
findings to only those regions of amplified gene clusters. Further, as discussed below, Hvman et 
al. and Pollack et al did gene-bv-gene analysis across all chromosomes . 

The Examiner has asserted that "Orntoft et al. could only compare the levels of about 40 
well-resolved and focused abundant proteins." (Page 7 of the Office Action mailed August 29, 
2005; emphasis in original). The Examiner further asserts that "Appellants have provided no fact 
or evidence concerning a lack of correlation between the specification's disclosure of low levels 
of amplification of DNA (which were not characterized on the basis of those in the Orntoft 
publication) and an associated rise in level of the encoded protein." (Page 7 of the Office Action 
mailed August 29, 2005). 

Appellants respectfully point out that while technical considerations did prevent Orntoft 
et al. from evaluating a larger number of proteins, the ones they did look at showed a clear 
correlation between mRNA and protein expression levels. As Orntoft et al. state, "In general 
there was a highly significant correlation (p<0.005) between mRNA and protein alterations.... 
26 well focused proteins whose genes had a known chromosomal location were detected in TCCs 
733 and 335, and of these 19 correlated (p<0.005) with the mRNA changes detected using the 
arrays." (See page 42, column 2 to page 34, column 2). Accordingly, Orntoft et al clearly 
support Appellants' position that proteins expressed by genes that are amplified in tumors are 
useful as cancer markers. 

In addition, discussed above, the levels of amplification for PR0351 were not "low" but 
significant, and ranged from 2.03-fold to 2.75-fold. Appellants note that the levels of gene 
amplification observed by Orntoft et al. were relatively low, averaging only 0.3-0.4 fold (page 
40, col. 1). In particular, the level of gene amplification associated with expression changes was 
only around two-fold (see Figure 2), as compared to the 2.04-fold to 2.75-fold amplification 
observed for PR0351. Even at these levels of gene amplification, Orntoft et al. found that "[i]n 
most cases, chromosomal gains detected by CGH were accompanied by an increased level of 
transcripts in both TCCs 733 (77%) and 827 (80%)" (page 40, col. 2; emphasis added). The 
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level of correlation between DNA copy number and increased mRNA levels observed by Orntoft 
et al, from 77-80% , clearly meets the standard of more likely than not. Orntoft et al. also found 
a "highly significant" correlation between mRNA and protein levels, with the two data sets 
studied having correlations of 39/40 (98%) and 19/26 (73%) (pages 42-43). 

Appellants respectfully submit that the Examiner has mischaracterized the methods used 
by Hyman et al and Pollack et al in their analysis. These papers did not use traditional CGH 
analysis to identify amplified genes. In Hyman et al, 13,824 cDNA clones were placed on glass 
slides in a microarray and genomic DNA from breast cancer cell lines and normal human WBCs 
were hybridized to the cDNA sequences. For expression analysis, RNA from tumor cell lines 
was hybridized on the same microarrays. The 13,824 arrayed cDNA clones were analyzed for 
gene expression and gene copy number in 14 breast cancer cell lines. Further, Hyman et al state 
that "[t]he cDNA/CGH microarray technique enables the direct correlation of copy number and 
expression data on a gene-by-gene basis throughout the genome." (See page 6242, column 2). 
Therefore, the analysis performed by Hyman et al was on a gene-by gene basis, and clearly 
shows that. "it is more likely than not" that a gene which is amplified in tumor cells will have 
increased gene expression. 

The Examiner also appears to misunderstand the data presented by Hyman et al The 
Examiner has asserted that "of the 12,000 transcripts analyzed, a set of 270 was identified in 
which overexpression was attributable to gene amplification." The Examiner concludes that 
"[t]his proportion is 2%; the Examiner maintains that 2% does not provide a reasonable 
expectation that the slight amplification of PR0351 would be correlated with elevated levels of 
mRNA, much less protein." (Page 7 of the Office Action mailed August 29, 2005). Appellants 
respectfully submit that the Examiner appears to have misinterpreted the results of Hyman et al 
Hyman et al. chose to do a genome-wide analysis of a large number of genes, most of which , as 
shown in Figure 2, were not amplified . Accordingly, the 2% number is meaningless, as the low 
figure mainly results from the fact that only a small percentage of genes are amplified in the first 
place. The significant figure is not the percentage of genes in the genome that show 
amplification, but the percentage of amplified genes that demonstrate increased mRNA and 
protein expression. 
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The Examiner has further asserted that the Hyman reference "found 44% of highly 
amplified genes showing overexpression at the mRNA level, and 10.5% of highly overexpressed 
genes being amplified; thus, even at the level of high amplification and high overexpression, the 
two do not correlate." (Page 7 of the Office Action mailed August 29, 2005). Appellants submit 
that the 10.5% figure is not relevant to the issue at hand. One of skill in the art would understand 
that there can be more than one cause of overexpression. The issue is not whether 
overexpression is always, or even typically caused by gene amplification, but rather, whether 
gene amplification typically leads to overexpression. 

The Examiner's assertion is not consistent with the interpretation Hyman et al 
themselves place on their data, stating that, u The results illustrate a considerable influence of 
copy number on gene expression patterns." (page 6242. col. 1; emphasis added). In the more 
detailed discussion of their results, Hyman et al teach that "[u]p to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (i.e., belonged to the global upper 7% of 
expression ratios) compared with only 6% for genes with normal copy number." (See page 
6242, col. 1 ; emphasis added). These details make it clear that Hyman et al. set a highly 
restrictive standard for considering a gene to be overexpressed; yet almost half of all highly 
amplified transcripts met even this highly restrictive standard . Therefore, the analysis performed 
by Hyman et al clearly shows that "it is more likely than not" that a gene which is amplified in 
tumor cells will have increased gene expression. 

In Pollack et al., DNA copy number alteration across 6,691 mapped human genes in 44 
predominantly advanced primary breast tumors and 10 breast cancer cell lines was profiled. 
Pollack et al further state, "Parallel microarray measurements of mRNA levels reveal the 
remarkable degree to which variation in gene copy number contributes to variation in gene 
expression in tumor cells." (See Abstract). "Genome-wide, of 1 1 7 high-level DNA 
amplifications (fluorescence ratios >4, and representing 91 different genes), 62% (representing 
54 different genes; . . .) are found associated with at least moderately elevated mRNA levels 
(mean-centered fluorescence ratios >2), and 42% (representing 36 different genes) are found 
associated with comparably highly elevated mRNA levels (mean-centered fluorescence ratios 
>4) " (See page 12966, column 1). Therefore, the analysis performed by Pollack et al was also 
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on a gene-by gene basis, and clearly shows that "it is more likely than not" that a gene which is 
amplified in tumor cells will have increased gene expression. 

The Examiner has asserted that Pollack et al is allegedly "limited to highly amplified 
genes which were not evaluated by the method of the instant specification." (Page 7 of the 
Office Action mailed August 29, 2005). Appellants respectfully submit that there is no 
disclosure in Pollack et al which indicates that it is limited to regions showing high 
amplification. Pollack et al states that their method had the sensitivity to detect 1 .5, 2 or 2.5 fold 
gains in single copy DNA (page 12964), and reports that "on average, a 2-fold change in DNA 
copy number is associated with a corresponding 1.5-fold change in mRNA levels " (Abstract). As 
discussed above, the PR0351 gene showed at least 2-fold amplification in ten different lung 
tumors; thus this is well within the range shown by Pollack et al to produce corresponding 
changes in mRNA levels. 

The Examiner has further asserted that "none of the three papers reported that the 
research was relevant to identifying probes that can be used as cancer diagnostics." (Page 5 of 
the Advisory Action mailed October 14, 2004). Appellants respectfully point out that Hyman et 
al conducted additional studies of one of the genes found to be amplified, HOXB7, and found "a 
clinical association between HOXB7 amplification and poor patient prognosis." (Page 
6244, col.l to col.2; emphasis added). Thus the results of Hyman et al confirm that genes which 
are amplified in tumors have prognostic utility . The Board's attention is also respectfully 
directed to the final paragraph of Pollack et al, wherein the authors conclude that "a substantial 
portion of the phenotypic uniqueness (and, by extension, the heterogeneity in clinical behavior) 
among patients' tumors may be traceable to underlying variation in DNA copy number." (Page 
12698, col. 2). Accordingly, Pollack et al confirm that genes that are amplified in at least one 
type of tumor are useful as markers for that type of tumor, and for prognostic uses directed to that 
type of tumor. 

Thus these articles, OrntofL Hyman and Pollack, collectively teach that in general gene 
amplification increases mRNA expression . 

With respect to the correlation between mRNA expression and protein levels, the 
Examiner has asserted that the Polakis Declaration is insufficient to overcome the rejection of 
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claims 58-62 since it is limited to a discussion of data regarding the correlation of mRNA levels 
and polypeptide levels and not gene amplification levels. The Examiner has asserted that the 
Declaration does not provide data such that the Examiner can independently draw conclusions. 
(Page 6 of the Advisory Action mailed October 14, 2004). 

Appellants submit that Dr. Polakis' Declaration was presented to support the position that 
there is a correlation between mRNA levels and polypeptide levels, the correlation between gene 
amplification and mRNA levels having already been established by the data shown in the Orntoft 
et aL Hyman et al % and Pollack et al articles . Appellants emphasize that the opinions expressed 
in the Polakis Declaration, including the quoted statement, are all based on factual findings. 
Subsequently, antibodies binding to about 30 of these tumor antigens were prepared, and mRNA 
and protein levels were compared. In approximately 80% of the cases, the researchers found that 
increases in the level of a particular mRNA correlated with changes in the level of protein 
expressed from that mRNA when human tumor cells are compared with their corresponding 
normal cells. Dr. Polakis' statement that "an increased level of mRNA in a tumor cell relative to 
a normal cell typically correlates to a similar increase in abundance of the encoded protein in the 
tumor cell relative to the normal cell" is based on factual experimental findings , clearly set forth 
in the Declaration. Accordingly, the Declaration is not merely conclusive, and the fact-based 
conclusions of Dr. Polakis would be considered reasonable and accurate by one skilled in the art. 

The case law has clearly established that in considering affidavit evidence, the Examiner 
must consider all of the evidence of record anew. 18 "After evidence or argument is submitted by 
the Appellant in response, patentability is determined on the totality of the record, by a 
preponderance of the evidence with due consideration to persuasiveness of argument" 19 
Furthermore, the Federal Court of Appeals held in In re Alton, "[W]e are aware of no reason why 
opinion evidence relating to a fact issue should not be considered by an examiner." 20 Appellants 

18 In re Rinehart, 531 F.2d 1084, 189U.S.P.Q. 143 (CC.P.A. 1976); In re Piasecki, 745 F.2d. 1015,226 
U.S.P.Q. 881 (Fed. Cir. 1985). 

19 In re Alton, 37 U.S.P.Q.2d 1578, 1584 (Fed. Cir. 1 996)(quoting In re Oetiker, 977 F.2d 1443, 1445, 24 
U.S.P.Q.2d 1443, 1444 (Fed. Cir. 1992)). 

20 Id at 1583. 
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2 1 

also respectfully draw the Examiner's attention to the Utility Examination Guidelines which 
state, "Office personnel must accept an opinion from a qualified expert that is based upon 
relevant facts whose accuracy is not being questioned; it is improper to disregard the opinion 
solely because of a disagreement over the significance or meaning of the facts offered." 
The statement in question from an expert in the field (the Polakis Declaration) states that "it is 
my considered scientific opinion that for human genes, an increased level of mRNA in a tumor 
cell relative to a normal cell typically correlates to a similar increase in abundance of the encoded 
protein in the tumor cell relative to the normal cell." Therefore, barring evidence to the contrary 
regarding the above statement in the Polakis Declaration, this rejection is improper under both 
the case law and the Utility guidelines. 

The Examiner asserts that "there is strong opposing evidence showing that gene 
amplification is not predictive of increased mRNA levels in normal and cancerous tissues and, in 
turn, that increased mRNA levels are frequently not predictive of increased polypeptide levels." 
(Page 6 of the Advisory Action mailed December 7, 2005). In support of this assertion, the 
Examiner refers to the previously cited references by Pennica et al, Konopka et al, Hu et al, 
Haynes et al., Lian et al and Fessler et al, as well as newly cited references by Chen et al, 
LaBaer, Gygi et al, and Greenbaum et al 

Appellants respectfully submit that, as discussed in detail above, the arguments presented 
by the Examiner in combination with the Pennica, Konopka, Hu, LaBaer, Chen, Haynes, Gygi, 
Lian, Fessler, and Greenbaum articles do not provide sufficient reasons to doubt the statements 
by Appellants that PR0351 has utility. As discussed above, the law does not require the 
existence of a "necessary" correlation between mRNA and protein levels. Nor does the law 
require that protein levels be "accurately predicted." According to the authors themselves, the 
data in the above cited references confirm that there is a general trend between protein expression 
and transcript levels, which meets the "more likely than not standard" and show that a positive 
correlation exists between mRNA and protein. 



Part IIB, 66 Fed. Reg. 1098 (2001). 
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Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between polypeptide and 
mRNA levels, these instances are exceptions rather than the rule. In the majority of amplified 
genes , the teachings in the art, as exemplified by Orntoft et al, Hyman et al, Pollack et al, and 
the Polakis Declaration, overwhelmingly show that gene amplification influences gene 
expression at the mRNA and protein levels. Therefore, one of skill in the art would reasonably 
expect in this instance, based on the amplification data for the PR035 1 gene, that the PR035 1 
polypeptide is concomitantly overexpressed. Thus, Appellants submit that the PR035 1 
polypeptide and the claimed antibodies that bind it have utility in the diagnosis of cancer. 

E. Even if a prima facie case of lack of utility has been established, it should be 
withdrawn on consideration of the totality of evidence 

Even if one assumes arguendo that it is more likely than not that there is no correlation 

between gene amplification and increased mRNA/protein expression, which Appellants submit is 

not true, a polypeptide encoded by a gene that is amplified in cancer would still have a specific, 

substantial, and credible utility. In support, Appellants respectfully draw the Board's attention to 

page 2 of the Declaration of Dr. Avi Ashkenazi (submitted with the Response filed April 29, 

2004) which explains that, 

even when amplification of a cancer marker gene does not result in significant 
over-expression of the corresponding gene product, this very absence of gene 
product over-expression still provides significant information for cancer diagnosis 
and treatment. Thus, if over-expression of the gene product does not parallel gene 
amplification in certain tumor types but does so in others, then parallel monitoring 
of gene amplification and gene product over-expression enables more accurate 
tumor classification and hence better determination of suitable therapy. In 
addition, absence of over-expression is crucial information for the practicing 
clinician. If a gene is amplified but the corresponding gene product is not over- 
expressed, the clinician accordingly will decide not to treat a patient with agents 
that target that gene product. 

Appellants thus submit that simultaneous testing of gene amplification and gene product 
over-expression enables more accurate tumor classification, even if the gene-product, the protein, 
is not over-expressed. This leads to better determination of a suitable therapy. Further, as 
explained in Dr. Ashkenazi 's Declaration, absence of over-expression of the protein itself is 
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crucial information for the practicing clinician. If a gene is amplified in a tumor, but the 
corresponding gene product is not over-expressed, the clinician will decide not to treat a patient 
with agents that target that gene product. This not only saves money, but also has the benefit that 
the patient can avoid exposure to the side effects associated with such agents. 

This utility is further supported by the teachings of the article by Hanna and Mornin. 
(Pathology Associates Medical Laboratories, August (1999); submitted in the IDS filed July 28, 
2004). The article teaches that the HER-2/neu gene has been shown to be amplified and/or over- 
expressed in 10%-30% of invasive breast cancers and in 40%-60% of intraductal breast 
carcinomas. Further, the article teaches that diagnosis of breast cancer includes testing both the 
amplification of the HER-2/neu gene (by FISH) as well as the over-expression of the HER-2/neu 
gene product (by IHC). Even when the protein is not over-expressed, the assay relying on both 
tests leads to a more accurate classification of the cancer and a more effective treatment of it. 

The Examiner has asserted that Hanna et al allegedly "show that gene amplification does 
not reliably correlate with protein over-expression, and thus the level of polypeptide expression 
must be tested empirically." (Pages 9-10 of the Office Action mailed August 29, 2005). 
Appellants respectfully point out that the Examiner appears to have misread Hanna et al Hanna 
et al clearly state that gene amplification (as measured by FISH) and polypeptide expression (as 
measured by immunohistochemistry, IHC) are well correlated ("in general, FISH and IHC results 
correlate well" (Hanna et al p. 1 , col. 2)). It is only a subset of tumors which show discordant 
results. Thus Hanna et al support Appellants' position that it is more likely than not that gene 
amplification correlates with increased polypeptide expression. 

Appellants have clearly shown that the gene encoding the PR0351 polypeptide is 
amplified in at least ten primary lung tumors. Therefore, the PR0351 gene, similar to the HER- 
2/neu gene disclosed in Hanna et al, is a tumor associated gene. Furthermore, as discussed 
above, in the majority of amplified genes, the teachings in the art overwhelmingly show that gene 
amplification influences gene expression at the mRNA and protein levels. Therefore, one of skill 
in the art would reasonably expect in this instance, based on the amplification data for the 
PR0351 gene, that the PR0351 polypeptide is concomitantly overexpressed. 
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However, even if gene amplification does not result in overexpression of the gene product 
(i.e., the protein) an analysis of the expression of the protein is useful in determining the course 
of treatment, as supported by the Ashkenazi Declaration. The Examiner has asserted that "there 
is no indication that the PR0351 protein levels increase or stay the same. Further research would 
be needed to determine PR0351 protein levels in cancers showing gene amplification of the 
PR0351 gene." (Page 4 of the Office Action mailed June 16, 2 0 04). The Examiner appears to 
view the testing described in the Ashkenazi Declaration and the Hanna paper as experiments 
involving further characterization of the PR0351 polypeptide itself. In fact, such testing is for 
the purpose of characterizing not the PR0351 polypeptide, but the tumors in which the gene 
encoding PR0351 is amplified. The PR0351 polypeptide is therefore useful in tumor 
categorization, the results of which become an important tool in the hands of a physician 
enabling the selection of a treatment modality that holds the most promise for the successful 
treatment of a patient. 

For the reasons given above, Appellants respectfully submit that the present specification 
clearly describes, details and provides a patentable utility for the claimed invention. 
Accordingly, Appellants respectfully request reconsideration and reversal of the rejections of 
Claims 58-62 under 35 U.S.C. §101 . 

ISSUE II: Claims 58-62 satisfy the enablement requirement of 35 U.S.C. §112, first 
paragraph. 

Claims 58-62 stand rejected under 35 U.S.C. §112, first paragraph, allegedly "since the 
claimed invention is not supported by either a credible, specific and substantial asserted utility or a 
well established utility for the reasons set forth above, one skilled in the art clearly would not know 
how to use the claimed invention." (Pages 2-3 of the Office Action mailed August 29, 2005). 

In this regard, Appellants refer to the arguments and information presented above in 

response to the outstanding rejection under 35 U.S.C. § 101, wherein those arguments are 

incorporated by reference herein. Appellants respectfully submit that as described above, the 

PR0351 polypeptide has utility in the diagnosis of cancer and based on such a utility, one of skill 

in the art would know exactly how to use the claimed antibodies that bind the PR035 1 

polypeptide for diagnosis of cancer, without undue experimentation. 
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Accordingly, Appellants respectfully request reconsideration and reversal of the 
enablement rejection of Claims 58-62 under 35 U.S.C. §112, first paragraph. 
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CONCLUSION 



For the reasons given above, Appellants submit that the specification discloses at least 
one patentable utility for the antibodies of Claims 58-62, and that one of ordinary skill in the art 
would understand how to used the claimed antibodies, for example in the diagnosis of lung 
tumors. Therefore, Claims 58-62 meet the requirements of 35 U.S.C. §101 and 35 U.S.C. §1 12, 
first paragraph. 

Accordingly, reversal of all the rejections of Claims 58-62 is respectfully requested. 
Please charge any additional fees; including fees for additional extension of time, or 
credit overpayment to Deposit Account No. 08-1641 (referencing Attorney's Docket 
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CLAIMS APPENDIX 

Claims on Appeal 

58. An isolated antibody that specifically binds to the polypeptide of SEQ ID NO: 1 32. 

59. The antibody of Claim 58 which is a monoclonal antibody. 

60. The antibody of Claim 58 which is a humanized antibody. 

61 . An antigen binding fragment of the antibody of Claim 58. 

62. The antibody of Claim 28 which is labeled. 
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9. EVIDENCE APPENDIX 

1. Declaration of Avi Ashkenazi, Ph.D. under 37 C.F.R. §1.132; with attached Exhibit A 
(Curriculum Vitae). 

2. Declaration of Paul Polakis, Ph.D. under 37 C.F.R. §1.132. 

3. Orntoft, T.F., et al., "Genome-wide Study of Gene Copy Numbers, Transcripts, and 
Protein Levels in Pairs of Non-Invasive and Invasive Human Transitional Cell Carcinomas," 
Molecular & Cellular Proteomics 1:37-45 (2002). 

4. Hyman, E., et al. 5 "Impact of DNA Amplification on Gene Expression Patterns in Breast 
Cancer," Cancer Research 62:6240-6245 (2002). 

5. Pollack, J.R., et al., "Microarray Analysis Reveals a Major Direct Role of DNA Copy 
Number Alteration in the Transcriptional Program of Human Breast Tumors," Proc. Natl Acad. 
ScL USA 99:12963-12968 (2002). 

6. Hanna, J.S., et al., "HER-2/neu Breast Cancer Predictive Testing," Pathology Associates 
Medical Laboratories (1999). 

7. Declaration of Audrey D. Goddard, Ph.D. under 37 C.F.R. §1.132, with attached Exhibits 
A-G: 

A. Curriculum Vitae of Audrey D. Goddard, Ph.D. 

B. Higuchi, R. et al., "Simultaneous amplification and detection of specific DNA 
sequences," Biotechnology 10:413-417 (1992). 

C. Livak, K.J., et al., "Oligonucleotides with fluorescent dyes at opposite ends provide a 
quenched probe system useful for detecting PCR product and nucleic acid hybridization," PCR 
Methods Appl. 4:357-362 (1995). 

D. Heid, C.A. et al., "Real time quantitative PCR," Genome Res, 6:986-994 (1996). 

E. Pennica, D. et al., "WISP genes are members of the connective tissue growth factor 

family that are up-regulated in Wnt-1 -transformed cells and aberrantly expressed in human colon 

tumors," Proc. Natl. Acad ScL USA 95: 1 4717-1 4722 ( 1 998). 
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F. Pitti, R.M. et al., "Genomic amplification of a decoy receptor for Fas ligand in lung and 
colon cancer," Nature 396:699-703 (1998). 

G. Bieche, L et al., "Novel approach to quantitative polymerase chain reaction using real- 
time detection: Application to the detection of gene amplification in breast cancer," Int. J. Cancer 
78:661-666(1998). 

8. Pennica, D. et al., "WISP genes are members of the connective tissue growth factor family 
that are up-regulated in Wnt-1 -transformed cells and aberrantly expressed in human colon tumors," 
Proa Natl. Acad. Sci. USA 95:14717-14722 (1998). 

9. Konopka, J.B. et al., "Variable expression of the translocated c-abl oncogene in 
Philadelphia-chromosome-positive B-lymphoid cell lines from chronic myelogenous leukemia 
patients," Proc. Natl. Acad. Set USA 83:4049-4052 (1986). 

10. Haynes, P. A. et al., "Proteome analysis: Biological assay or data archive?" 
Electrophoresis 19:1862-1871 (1998). 

1 1 . Hu, Y. et al., "Analysis of genomic and proteomic data using advanced literature mining," 
Journal of Proteome Research 2:405-412 (2003). 

1 2. Hittelman, W., "Genetic instability in epithelial tissues at risk for cancer," Ann. NY Acad 
Sd. 952:1-12 (2001). 

13. Fessler, M. B. et al., "A genomic and proteomic analysis of activation of the human 
neutrophil by lipopolysaccharide and its mediation by p38 mitogen-activated protein kinase," J. 
Biol. Chem. 277:31291-31302 (2002). 

14. Lian, Z. et al., "Genomic and proteomic analysis of the myeloid differentiation program," 
£W98:513-524(2001). 

15. Greenbaum, D. et al., "Comparing protein abundance and mRNA expression levels on a 
genomic scale," Genome Biol. 4:1 17.1-1 17.8- (2003). 
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I. 



16. Chen, G. et aL, "Discordant protein and mRNA expression in lung adenocarcinomas," 
Mol Cell Proteomics 1:304-313 (2002). 

1 7. LaBaer, J., "Mining the literature and large datasets," Nature Biotechnology 21 :976-977 
(2003). 

1 8. Gygi, S. P. et al., "Correlation between protein and mRNA abundance in yeast," Mol Cell 
Biol 19:1720-1730(1999). 

Item 1 was submitted with Appellants' Response filed April 29, 2004, and acknowledged as having 
been entered into the record by the Examiner in the Office Action mailed June 16, 2004. 

Item 2 was submitted with Appellants' Response filed July 28, 2004, and acknowledged as having 
been considered by the Examiner in the Advisory Action mailed October 14, 2004. 

Items 3-6 were made of record by Appellants in their IDS filed July 28, 2004, and initialed as 
having been considered by the Examiner on October 12, 2004. 

Item 7 was submitted with Appellants' Response filed and acknowledged as having been 
considered by the Examiner in the Office Action mailed 

Items 8-10 were first cited by the Examiner in the Office Action mailed February 24, 2004, and 
were made of record in the Advisory Action mailed December 7, 2005. 

Item 1 1 was first cited by the Examiner in the Advisory Action of October 14, 2004, and was made 
of record by the Examiner in the Advisory Action mailed December 7, 2005. 

Items 12-14 were made of record by the Examiner in the Office Action mailed August 29, 2005, 
and again in the Advisory Action mailed December 7, 2005. 

Items 15-18 were made of record by the Examiner in the Advisory Action mailed December 7, 
2005. 
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10. RELATED PROCEEDINGS APPENDIX 

None. 

SV 21 92676 vl 

3/24/06 10:37 AM (39780.2630) 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



Applicant 



Ashkenazi et al. 



Group Art Unit 1647 



CERTIFICATE OF EXPRESS MAILING 



App. No. 



09/903,925 



I hereby certify that this correspondence is 



Filed 



July 11,2001 



being deposited with the United States 
Postal Service with sufficient postage as 



For 



SECRETED AND 
TRANSMEMBRANE 
POLYPEPTIDES AND NUCLEIC 
ACIDS ENCODING THE SAME 



first class mail in an envelope addressed to 
Commissioner of Patents, Washington 



D.C. 20231 on: 



(Date) 



Examiner : Hamud, Fozia M 



Commissioner of Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 

DECLARATION OF AVI ASHKENAZI. Ph.D UNDER 37 C.F.R. $ 1.132 

I, Avi Ashkenazi, Ph.D. declare and say as follows: - 

1 . I am Director and Staff Scientist at the Molecular Oncology Department of 
Genentech, Inc., South San Francisco, CA 94080. 

2. I joined Genentech in 1988 as a postdoctoral fellow. Since then, I have 
investigated a variety of cellular signal transduction mechanisms, including apoptosis, and have 
developed technologies to modulate such mechanisms as a means of therapeutic intervention in 
cancer and autoimmune disease. I am cunrently involved in the investigation of a series of 
secreted proteins over-expressed in tumors, with the aim to identify useful targets for the 
development of therapeutic antibodies for cancer treatment. 

3. My scientific Curriculum Vitae, including my list of publications, is attached to 
and forms part of this Declaration (Exhibit A). 

4. . Gene amplification is a process in which chromosomes undergo changes to 
contain multiple copies of certain genes that normally exist as a single copy, and is an important 
factor in the pathophysiology of cancer. Amplification of certain genes (e.g., Myc or Her2/Neu) 




gives cancer cells a growth or survival advantage relative to normal cells, and might also provide 
a mechanism of tumor cell resistance to chemotherapy or radiotherapy. 

5. If gene amplification results in over-expression of the mRNA and the 
corresponding gene product, then it identifies that gene product as a promising target for cancer 
therapy, for example by the therapeutic antibody approach. Even in the absence of over- 
expression of the gene product, amplification of a cancer marker gene - as detected, for example, 
by the reverse transcriptase TaqMan® PGR or the fluorescence in situ hybridization (FISH) 
assays -is useful in the diagnosis or classification of cancer, or in predicting or monitoring the 
efficacy of cancer therapy. An increase in gene copy number can result not only from 
intrachromosomal changes but also from chromosomal aneuploidy. It is important to understand 
that detection of gene amplification can be used for cancer diagnosis even if the determination 
includes measurement of chromosomal aneuploidy. Indeed, as long as a significant difference 
relative to normal tissue is detected, it is irrelevant if the signal originates from an increase in the 
number of gene copies per chromosome and/or an abnormal number of chromosomes. 

6. I understand that according to the Patent Office, absent data demonstrating that 
the increased copy number of a gene in certain types of cancer leads to increased expression of 
its product, gene amplification data are insufficient to provide substantial utility or well 
established utility for the gene product (the encoded polypeptide), or an antibody specifically 
binding the encoded polypeptide. However, even when amplification of a cancer marker gene 
does not result in significant over-expression of the corresponding gene product, this very 
absence of gene product over-expression still provides significant information for cancer 
diagnosis and treatment. Thus, if over-expression of the gene product does npt parallel gene 
amplification in certain tumor types but does so in others, then parallel monitoring of gene 
amplification and gene product over-expression enables more accurate tumor classification and 
hence better determination of suitable therapy. In addition, absence of over-expression is crucial 
information for the practicing clinician. If a gene is amplified but the corresponding gene 
product is not over-expressed, the clinician accordingly will decide not to treat a patient with 
agents that target that gene product. 

7. I hereby declare that all statements made herein of my own knowledge are true 
and that all statements made on information or belief are believed to be true, and further that 
these statements were, made with the knowledge that willful false statements and the like so 



made are punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the 
United States Code and that such willful statements may jeopardize the validity of the 
application or any patent issued thereon. 
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differential expression of various genes in tumor cells relative to normal cells. 
The purpose of this research is to identify proteins that are abundantly expressed 
on certain tumor cells and that are either (i) not expressed, or (ii) expressed at 
lower levels, on corresponding normal cells. We call such differentially expressed 
proteins "tumor antigen proteins'*. When such a tumor antigen protein is 
identified, one can produce an antibody that recognizes and binds to that protein. 
Such an antibody finds use in the diagnosis of human cancer and may ultimately 
serve as an effective therapeutic in the treatment of human cancer. 

4. In the course of the research conducted by Genentech's Tumor Antigen 
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studying differential gene expression in human tumor cells relative to normal cells, 
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above, we have observed that there is a strong correlation between changes in the 
level of mRNA present in any particular cell type and the level of protein 
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observations we have found that increases in the level of a particular mRNA 
correlates with changes in the level of protein expressed from that mRNA when 
human tumor cells are compared with their corresponding normal cells. 

6. Based upon my own experience accumulated in more than 20 years of 
research, including the data discussed in paragraphs 4 and 5 above and my 
knowledge of the relevant scientific literature, it is my considered scientific 
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central dogma in molecular biology that increased mRNA levels are predictive of 
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statements and the like so made are punishable by fine or imprisonment, or both, 
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Genome-wide Study of Gene Copy Numbers, 
Transcripts, and Protein Levels in Pairs of 
Non-invasive and Invasive Human Transitional 
Cell Carcinomas* 
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Gain and loss of chromosomal material is characteristic 
of bladder cancer, as well as malignant transformation in 
general. The consequences of these changes at both the 
transcription and translation levels is at present unknown 
partly because of technical fimltations. Here we have at- 
tempted to address this question in pairs of non-invasive 
and invasive human bladder tumors using a combination 
of technology that included comparative genomic hybrid- 
ization, high density oligonucleotide array-based monitor- 
ing of transcript levels (5600 genes), and high resolution 



phenomenon at both the transcription and translation levels. 
High throughput array studies of the breast cancer cell line 
BT474 has suggested that there is a correlation between 
DNA copy numbers and gene expression in highly amplified 
areas (2), and studies of individual genes in solid tumors 
have revealed a good correlation between gene dose and 
mRNA or protein levels in the case of c-erb-B2, cyc//n d1, 
emsl, and N-myc (3-5). However, a high cyclin D1 protein 
expression has been observed without simultaneous am- 



ing of transcript levels (5600 genes), and high ^so^on ( } ^ |ow |eve| of c . myc copy numb er in- 

two-dimensional gel e^ was observed without concomitant c-myc protein 

that there is a gene dosage effect mat in some cases 



superimposes on other regulatory mechanisms. This ef- 
fect depended (p < 0.015) on the magnitude of the com- 
parative genomic hybridization change. In general (18 of 
23 cases), chromosomal areas with more than 2-fold gain 
of DNA showed a corresponding increase in mRNA tran- 
scripts. Areas with loss of DNA, on the other hand, 
showed either reduced or unaltered transcript levels^ Be- 
cause most proteins resolved by two-dimensional gels 
are unknown it was only possible to compare mRNA and 
protein alterations irrelatively few cases of well focused 
abundant proteins. IWHh few exceptions we found a good 
correlation (p < 0.005) between transcript alterations and 
protein levels. The implications, as well as limitations, 
of the approach are discussed. Molecular & Cellular 
Proteomics 1:37-45, 2002. 



Aneuploidy is a common feature of most human cancers 
(1), but little is known about the genome- wide effect of this 
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crease was observed without concomitant c-myc protein 
overexpression (6). 

In human bladder tumors, karyotyping, fluorescent In situ 
hybridization, and comparative genomic hybridization (CGH) 1 
have revealed chromosomal aberrations that seem to be 
characteristic of certain stages of disease progression. In the 
case of non-invasive pTa transitional cell carcinomas (TCCs), 
this includes loss of chromosome 9 or parts of it, as well as 
loss of Y in males. In minimally invasive pT1 TCCs, the fol- 
lowing alterations have been reported: 2q-. 11p-, 1q+. 
11q13+, 17q+, and 20q+ (7-12). It has been suggested that 
these regions harbor tumor suppressor genes arid onco- 
genes; however, the large chromosomal areas involved often 
contain many genes, making meaningful predictions of the 
functional consequences of losses and gains very difficult. 

In this Investigation we have combined genome-wide tech- 
nology for detecting genomic gains and losses (CGH) with 
gene expression profiling techniques (microarrays and pro- 
teomics) to determine the effect of gene copy number on 
transcript and protein levels in pairs of non-invasive and in- 
vasive human bladder TCCs. 

EXPERIMENTAL PROCEDURES 
Maferiai-BJadder tumor biopsies were sampled after informed 
consent was obtained and after removal of tissue for routine pathol- 
ogy examination. By light microscopy tumors 335 and 532 were 
staged by an experienced pathologist as pTa (superficial papillary), 

1 The abbreviations used are: CGH, comparative genomic hybrid- 
ization; TCC, transitional ceil carcinoma; LOH. loss of heterozygosity; 
PA-FAB P, psoriasis-associated fatty acid-binding protein; 2D, 
two-dimensional. 
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Fro. 1 . DNA copy number and mRNA expression level. Shown from left to right are chrornosorne (C/jr.) f ^^^^^J^^^^^^^^y^^^ 
expression levelof specific genes, and overall expression level along the chromosome. A, expression of mRNA In ' »nvasive tumor 733 « 
SZZrZ I S tne ^nv^^rpari tumor 335. B. expression of mRNA in invasive tumor 827 compared with the noninvasive 

SrjdSS represents a mean of four chromosomes and is surrounded by thin curves mdica ing one standard 
SelSn a ratio value of 1 (no change), and the ver.'ca/^es J^^^^ 

0 5 (/eft) and 2.0 (rfgnr). In chromosomes where the non-invasive tumor 335 used for comparison showed alt erationsi In DNA <^^ ra ™ 
orofile of that chromosome is shown to the right of the Invasive tumor profile. The colored bars represents one gene each, 'dentrfiedby the 
^ nS^^ re name oMhe gene can be seen at www.MDLDK/sdata.htm^ The bars Indicate ^J>»^^™« 
tf»m » andlhe colors indicate the expression level of the gene in the invasive tumor compared with the norv.nvasrve counterpart, >2-foW 

IZ^ofhoZ teTo^soZXe coL indicate tat at least half of the genes were up-regulated piacK) at lea* ihtf of the genes 
SSES^^ of the genes are unchanged (oranoe). If a gene was absent in one of the samples and present rn 

a 2400 crfange. A 2-fold level was chosen as ^c^nded to one standard delation In a double 
determination of -1800 genes. Centromeres and heterochromatic regions were excluded from data analysis. 



grade I and II, respectively, tumors 733 and 827 were staged as pT1 
(invasive into submucosal 733 was staged as solid, and 827 was 
staged as papillary, both grade 111. ' . 

mRNA Preparation -Tissue biopsies, obtained fresh from surgery, 
were embedded immediately in a sodium-guanidinium thiocyanate 
solution and stored at -80 °C. Total RNA was isolated using the 
RNAzol B RNA isolation method (WAK-Chemie Medical GMBH). 
po!yCA) + RNA was isolated by an oligo{dT) selection step (Oligotex 
mRNA kit; Qiagen). 

cBNA Preparation- 1 hq of mRNA was used as starting material. 
The first and second strand cDNA synthesis was performed using the 
Superscript® choice system (Invltrogen) according to the manufac- 
turer's instructions but using an oligofdT) primer containing a T7 RNA 
polymerase binding site. Labeled cRNA was prepared using the ME- 
GAscrip® in vitro transcription kit (Ambion). Biotin-labeled CTP and 



UTP (Enzo) was used, together with unlabeled NTPs in the reaction. 
Following the in vitro transcription reaction, the uniricorporated nu- 
cleotides were removed using RNeasy columns (Qiagen). 

Array Hybridization and Scanning— Amy hybridization and scan- 
ning was modified from a previous method (13). 10 ^g of cRNA was 
fragmented at 94 °C for 35 min in buffer containing 40 rrw Tris 
acetate, pH 8.1 , 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, 
the fragmented cRNA in a 6x SSPE-T hybridization buffer (1 m NaCI. 
10 mM Tris, pH 7.6, 0.005% Triton), was heated to 95 °C for 5 min. 
subsequently cooled to 40 °C, and loaded onto the Afrymetrix probe 
array cartridge. The probe array was then incubated for 16 h at 40 °C 
at constant rotation (60 rpm). The probe array was exposed to 10 
washes in 6x SSPE-T at 25 °C followed by 4 washes in 0.5x SSPE-T 
at 50°C. The biotinylated cRNA was stained with a streptavidin- 
phycoerythrin conjugate, 10 ^g/ml (Molecular Probes) in 6x SSPE-T 
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Fig. 1— continued 



for 30 min at 25 °C followed by 1 0 washes in 6x SSPE-T at 25 °C. The 
probe arrays were scanned at 560 nm using a conf ocai laser scanning 
microscope (made for Affymetrix by Hewlett-Packard). The readings 
from the quantitative scanning were analyzed by Affymetrix gene 
expression analysis software. 

MicrosateWte >Ana/ys*s - Microsateilite Analysis was performed as 
described previously (14). MicrosateHites were selected by use of 
www.ncbi.nlm.nih.gov/genemap98, and primer sequences were ob- 
tained from the genome data base at www.gdb.org. DNA was extracted 
from tumor and Wood and ampBfied by PCB in a volume of 20^1 for 35 
cycles. The ampficons were denatured and electrophoresed for 3 h in an 
ABI Prism 377. Data were collected in the Gene Scan program for 
fragment analysis. Loss of heterozygosity was defined as less than 33% 
of one allele detected in tumor amplicons compared with blood. 

Proteomic Analysis -TCCs were minced into small pieces and 
homogenized in a small glass homogenizer in 0.5 ml of lysis solution. 
Samples were stored at -20 °C until use. The procedure for 2D gel 
electrophoresis has been described in detail elsewhere (15. 16). Gels 
were stained with silver nitrate and/or Coomassie Brilliant Blue. Pro- 
teins were Identified by a combination of procedures that included 
microsequencing, mass spectrometry, two-dimensional gel Western 
immunobtotting, and comparison with the master two-dimensional gel 
image of human keratinocyte proteins; see biobase.dk/cgi-bin/celis. 

CGH- Hybridization of differentially labeled tumor and normal DNA 
to normal metaphase chromosomes was performed as described 
previously (10). Ruorescein-labeled tumor DNA (200 ng), Texas Red- 



labeled reference DNA (200 ng), and human Cot-1 DNA (20 ^g) were 
denatured at 37 °C for 5 min and applied to denatured normal met- 
aphase slides. Hybridization was at 37 °C for 2 days. After washing, 
the slides were counterstained with 0.15 ^g/ml 4,6-diamidino-2-phe- 
nylindole in an anti-fade solution. A second hybridization was per- 
formed for all tumor samples using fluorescein-labeled reference DNA 
and Texas Red-labeled tumor DNA (inverse labeling) to confirm the 
aberrations detected during the initial hybridization. Each CGH ex- 
periment also included a normal control hybridization using fluores- 
cein- and Texas Red-labeled normal DNA. Digital image analysis was 
used to identify chromosomal regions with abnormal fluorescence 
ratios, indicating regions of DNA gains and losses. The average 
green:red fluorescence intensity ratio profiles were calculated using 
four images of each chromosome (eight chromosomes total) with 
normalization of the greenxed fluorescence intensity ratio for the 
entire metaphase and background correction. Chromosome identifi- 
cation was performed based on 4 t 6-diamidino-2-phenyiindole band- 
ing patterns. Only images showing uniform high intensity fluores- 
cence with minimal background staining were analyzed. All 
centromeres, p arms of acrocentric chromosomes, and heterochro- 
matic regions were excluded from the analysis. 

RESULTS 

Comparative Genomic Hybridization— The CGH analysis 
identified a number of chromosomal gains and losses in the 
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Table I 

Correlation between alterations detected by CGH and by expression monitoring 
Top CGH used as dependent variable (if CGH alteration - what expression ratio was found); bottom, altered expression used as 
independent variable (if expression alteration - what CGH deviation was found). 
Tumor 733 vs. 335 

CGH alterations 



Tumor 827 vs. 532 



Expre ssion change clusters 

13 Gain 10 Up-regulation 

0 Down-regulation 

3 No change 
10 Loss 1 Up-regulation 

5 Down-regulation 

4 No change 

Tumor 733 vs. 335 
CGH alterations 



Concordance CGH alterations Expression cha n 9 e clusters 



Concordance 



77% 



.50% 



10 Gain 8 Up-reguiation 

0 Down-regulation 
2 No change 

12 Loss 3 Up-regulation 

2 Down regulation 
7 No change 



80% 



17% 



Expression change clusters 



Tumor 827 vs. 532 
Concordance Expression change clusters CGH alterations 



Concordance 



16 Up-regulation 
21 Down-regulation 
15 No change 



11 Gain 

2 Loss 

3 No change 
1 Gain 

8 Loss 

12 No change 
3 Gain 

3 Loss 

9 No change 



69% 
38% 
60% 



17 Up-regulaiion ^ 10 Gain 
5 Loss 



9 Down-regulation 



21 No change 



2 No change 
OGain 

3 Loss 

6 No change 
1 Gain 
3 Loss 

17 No change 



59% 
33% 
81% 



two invasive tumors (stage pT1 , TCCs 733 and 827), whereas 
the two non-invasive papillomas (stage pTa, TCCs 335 and 
532) showed only 9p-, 9q22-q33- f and X-, and 7+, 9q-, 
and Y-, respectively. Both Invasive tumors showed changes 
(1q22-24+, 2q14.1-qter-, 3q12-q13.3-, 6q12-q22-, 
9q34+» 11q12-q13+. 17+, and 20q11.2-q12+) that are typ- 
ical for their disease stage, as well as additional alterations, 
some of which are shown in Fig. 1. Areas with gains and 
losses deviated from the normal copy number to some extent, 
and the average numerical deviation from normal was 0.4-fold 
in the case of TCC 733 and 0.3-f old for TCC 827. The largest 
changes, amounting to at least a doubling of chromosomal 
content, were observed at 1q23 in TCC 733 (Fig. 1/4) and 
20q12inTCC827(Rg. 1B). 

mRNA Expression in Relation to DNA Copy Number-The 
mRNA levels from the two invasive tumors (TCCs 827 and 
733) were compared with the two non-invasive counterparts 
(TCCs 532 and 335). This was done in two separate experi- 
ments in which we compared TCCs 733 to 335 and 827 to 
532, respectively, using two different scaling settings for the 
arrays to rule out scaling as a confounding parameter. Ap- 
proximately 1,800 genes that yielded a signal on the arrays 
were searched in the Unigene and Genemap data bases for 
chromosomal location, and those with a known location 
(1096) were plotted as bars covering their purported locus. In 
that way it was possible to construct a graphic presentation of 
DNA copy number and relative mRNA levels along the Indi- 
vidual chromosomes (Fig. 1). 

For each mRNA a ratio was calculated between the level in 
the invasive versus the non-invasive counterpart. Bars, which 
represent chromosomal location of a gene, were color-coded 
according to the expression ratio, and only differences larger 



than 2-foid were regarded as informative (Fig. 1). The density 
of genes along the chromosomes varied, and areas contain- 
ing only one gene were excluded from the calculations. The 
resolution of the QGH method is very low, and some of the 
outlier data may be because of the fact that the boundaries of 
the chromosomal aberrations are not known at high resolution. 

Two sets of calculations were made from the data. For the 
first set we used CGH alterations as the independent variable 
and estimated the frequency of expression alterations in these 
chromosomal areas. In general, areas with a strong gain of 
chromosomal material contained a cluster of genes having 
increased mRNA expression. For example, both chromo- 
somes 1q21-q25, 2p and 9q, showed a relative gain of more 
than 100% in DNA copy number that was accompanied by 
increased mRNA expression levels In the two tumor pairs (Fig. 
1). in most cases, chromosomal gains detected by CGH were 
accompanied by an increased level of transcripts in both 
TCCs 733 (77%) and 827 (80%) (Table l f fop). Chromosomal 
losses, on the other hand, were not accompanied by de- 
creased expression in several cases, and were often regis- 
tered as having unaltered RNA levels (Table I, top). The inabil- 
ity to detect RNA expression changes in these cases was not 
because of fewer genes mapping to the lost regions (data not 
shown). 

In the second set of calculations we selected expression 
alterations above 2-fold as the independent variable and es- 
timated the frequency of CGH alterations in these areas. As 
above, we found that increased transcript expression corre- 
lated with gain of chromosomaJ material (TCC 733, 69% and 
TCC 827, 59%), whereas reduced expression was often de- 
tected in areas with unaltered CGH ratios (Table I, bottom). 
Furthermore, as a control we looked at areas with no alter- 
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Fig 2 Correlation between maximum CGH aberration and the ability to detect expression change by ofigonucieotido array 

courrteroarts 532 and 335. The expression change was taken from the Expression line to the right in Fig. 1, which depicts the resumng 
ex^ScrSge for ^^S^La region. At least half of the mRNAs from a given region have to be eimerup-or 
to be^o^ as an expression change. All chromosomal arms in which the CGH ratio plus or minus one standard oWion was outside the 
ratio value of one were included. 



atiori in expression. No alteration was detected by CGH in 
most of these areas (TCC 733, 60% and TCC 827. 81 %; see 
Table I, bottom). Because the ability to observe reduced or 
increased mRNA expression clustering to a certain chromo- 
somal area clearly reflected the extent of copy number 
changes, we plotted the maximum CGH aberrations in the 
regions showing CGH changes against the ability to detect a 
change in mRNA expression as monitored by the oligonucleo- 
tide arrays (Fig. 2)(ss>x both tumors TCC 733 (p < 0.015) and 
TCC 827 (p < 0.00003) a highly significant correlation was 
observed between the level of CGH ratio change (reflecting 
the DNA copy number) and alterations detected by the array 
based technology (Fig. 2^ Similar data were obtained when 
areas with altered expression were used as independent vari- 
ables. These areas correlated best with CGH when the CGH 
ratio deviated 1 .6- to 2.0-fold (Table I, bottom) but mostly did 
not at lower CGH deviations. These data probably reflect that 
loss of an allele may only lead to a 50% reduction in expres- 
sion level, which is at the cut-off point for detection of expres- 
sion alterations. Gain of chromosomal material can occur to a 
much larger extent. 

Microsatettite-based Detection of Minor Areas of Loss- 
es- In TCC 733, several chromosomal areas exhibiting DNA 
amplification were preceded or followed by areas with a nor- 
mal CGH but reduced mRNA expression (see Fig. 1 , TCC 733 
chromosome 1q32, 2p2T, and 7q21 and q32 ( 9q34, and 
10q22). To determine whether these results were because of 
undetected loss of chromosomal material in these regions or 



because of other non-structural mechanisms regulating tran- 
scription, we examined two mlcrosatellites positioned at chro- 
mosome 1q25-32 and two at chromosome 2p22. Loss of 
heterozygosity (LOH) was found at both 1q25 and at 2p22 
indicating that minor deleted areas were not detected with the 
resolution of CGH (Fig. 3). Additionally, chromosome 2p in 
TCC 733 showed a CGH pattern of gain/no change/gain of 
DNA that correlated with transcript increase/decrease/in- 
crease. Thus, for the areas showing increased expression 
there was a correlation with the DNA copy number alterations 
(Fig. 1 A). As indicated above, the mRNA decrease observed in 
the middle of the chromosomal gain was because of LOH, 
implying that one of the mechanisms for mRNA down-regu- 
lation may be regions that have undergone smaller losses of 
chromosomal material. However, this cannot be detected with 
the resolution of the CGH method. 

In both TCC 733 and TCC 827, the telomeric end of chro- 
mosome 11p showed a normal ratio in the CGH analysis; 
however, clusters of frve and three genes, respectively, lost 
their expression. Two microsatellites (D11S1760, D11S922) 
positioned close to MUC2, IGF2, and cathepsin D indicated 
LOH as the most likely mechanism behind the loss of expres- 
sion (data not shown). 

A reduced expression of mRNA observed in TCC 733 at 
chromosomes 3q24, 11p11, 12p12.2, 12q21.1, and 16q24 
and in TCC 827 at chromosome 11p15.5, 12p11, 15q11.2, 
and 18q12 was also examined for chromosomal losses using 
microsatellites positioned as close as possible to the gene loci 
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Fia 3. Microsatellite analysis of loss of heterozygosity* Tumor 
733 showing loss of heterozygosity at chromosome 1q25. detected 
(a) by D1S215 close to Hu class I histocompatibility antigen (gene 
number 38 in Rg. 1). (b) by D1S2735 close to cathepsin E (gene 
number 41 in Fig. 1), and (c) at chromosome 2p23 by D2S2251 close 
to general 0-spectrin (gene number 1 1 on Rg. 1) and of {d) tumor 827 
showing loss of heterozygosity at chromosome 18q12 by S18S11 18 
close to mitochondrial 3-oxoacyhcoenzyme A thiolase (gene number 
12 in Fig. 1). The upper curves show the electropherogram obtained 
from normal DNA from leukocytes (N), and the lower curves show the 
electropherogram from tumor DNA (7). In all cases one allele is 
partially lost in the tumor amplicon, 

showing reduced mRNA transcripts. Only the microsatetlite 
positioned at 18q12 showed LOH (Rg. 3), suggesting that 
transcriptional down-regulation of genes in the other regions 
may be controlled by other mechanisms. 

Relation between Changes in mRNA and Protein Levels - 
2D-PAGE analysis, in combination with Coomassie Brilliant 
Blue and/or silver staining, was carried out on all four tumors 
using fresh biopsy material. 40 well resolved abundant known 
proteins migrating in areas away from the edges of the pH 
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Rg. 4. Correlation between protein levels as judged by 20- 
PAGE and transcript ratio. For comparison proteins were divided In 
three groups, unaltered in level or up- or down-regulated (no/feonfaf 
axis). The mRNA ratio as determined by oligonucleotide arrays was 
plotted for each gene {vertical axis). mRNAs that were scored as 
present in both tumors used for the ratio calculation; A, mRNAs that 
were scored as absent in the invasive tumors (along horizontal axis) or 
as absent in non-invasive reference (top of figure). Two different 
scalings were used to exclude scaling as a confounder, TCCs 827 
and 532 (AA) were scaled with background suppression, and TCCs 
733 and 335 (#0) were scaled without suppression. Both compari- 
sons showed highly significant (p < 0.005) differences in mRNA ratios 
between the groups. Proteins shown were as follows: Group A (from 
/eft), phosphoglucomutase 1, glutathione transferase class *t number 
4 f fatty acid-binding protein homologue, cytokeratin 15, and cyto- 
keratin 13; B (from left), fatty acid-binding protein homologue, 28-kDa 
heat shock protein, cytokeratin 13, and calcyclin; Ctfrom teft), a-eno- 
lase, hnRNP B1, 28-kDa heat shock protein, 14-3-3-6, and 
pre-mRNA splicing factor, O. mesothelia! keratin K7 (type II); E (from 
top), glutathione S-transferase-TT and mesothelial keratin K7 (type II); 
F(from top and fe/f), adenytyl cyclase-associated protein, E-cadherin. 
keratin 19, calgizzarin, phosphoglycerate mutase, annexin IV, cy- 
toskeletal y-actin, hnRNP A1, integral membrane protein calnexin 
(IP90). hnRNP H, brain-type clathrin light chairva, hnRNP F, 70-kDa 
heat shock protein, heterogeneous nuclear ribonucleoprotein A/B, 
translationally controlled tumor protein, liver glyceraldehyde-3-phos- 
phate dehydrogenase, keratin 8, aldehyde reductase, and Na,K- 
ATPase ^1 subunit; G, (from top and /eft), TCP20, calgizzarin, 70- 
kDa heat shock protein, calnexin, hnRNP H, cytokeratin 15, ATP 
synthase, keratin 19, triosephosphate isomerase, hnRNP F. liver glyc- 
eraldehyde-3-phosphatase dehydrogenase, glutathione S-transfer- 
ase-77, and keratin 8; H (from /eft), plasma gelsolin. autoantigen cal- 
reticulin, thioredoxin, and NAD+-dependent 15 hydroxyprostaglandin 
dehydrogenase; / (from top), prolyl 4-hydroxyiase 0-subunit, cyto- 
keratin 20, cytokeratin 17, prohibition, and fructose 1,6-biphos- 
phatase; J annexin II; K, annexin IV; L (from top and /eft). 90-kDa heat 
shock protein, prolyl 4-hydroxylase 0-subunit, a-enolase. GRP 78, 
cyclophilln, and cofiltn. 

gradient, and having a known chromosomal location, were 
selected for analysis in the TCC pair 827/532. Proteins were 
identified by a combination of methods (see "Experimental 
Procedures"). In general there was a highly significant corre- 
lation (p < 0.005) between mRNA and protein alterations (Rg. 
4). Only one gene showed disagreement between transcript 
alteration and protein alteration. Except for a group of cyto- 
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Fig. 5. Comparison of protein and transcript levels in invasive 
and non-invasive TCCs. The upper part of the figure shows a 2D gel 
{left) and the oligonucleotide array {right) of JOC 532. The red rectan- 
gles on the upper gel highlight the areas that are compared below. 
Identical areas of 2D gels of TCCs 532 and 827 are shown below. 
Clearly, cytokeraUns 13 and 15 are strongly down-regulated In TCC 
827 (red annotation). The tile on the array containing probes for 
cytokeratin 15 is enlarged below the array (red arrow) from TCC 532 
and is compared with TCC 827. The upper row of squares in each tile 
corresponds to perfect match probes; the lower row corresponds to 
mismatch probes containing a mutation (used for correction for un- 
specific binding). Absence of signal is depicted as black, and the 
higher the signal the tighter the color. A high transcript level was 
detected in TCC 532 (6151 units) whereas a much lower level was 
detected in TCC 827 (absence of signals). For cytokeratin 13. a high 
transcript level was also present in TCC 532 (15659 units), and a 
much lower level was present in TCC 827 (623 units). The 2D gels at 
the bottom of the figure (toft) show levels of PA-FABP and adipocyte- 
FABP in TCCs 335 and 733 (invasive), respectively. Both proteins are 
down-regulated in the invasive tumor. To the right we show the array 
tiles for the PA-FABP transcript A medium transcript level was de- 
tected in the case of TCC 335 (1277 units) whereas very tow levels 
were detected In TCC 733 (166 units). /EF. isoelectric focusing. 



keratins encoded by genes on chromosome 17 (Fig. 5) the 
analyzed proteins did not belong to a particular family. 26 well 
focused proteins whose genes had a know chromosomal 
location were detected in TCCs 733 and 335, and of tKese 19 
correlated (p < 0.005) with the mRNA changes detected using 
the arrays (Fig. 4). For example, PA-FABP was highly ex- 
pressed in the non-invasive TCC 335 but lost in the invasive 
counterpart (TCC 733; see Fig. 5). The smaller number of 
proteins detected in both 733 and 335 was because of the 
smaller size of the biopsies that were available. 

11 chromosomal regions where CGH showed aberrations 
that corresponded to the changes in transcript levels also 
showed corresponding changes in the protein level (Table II). 
These regions included genes that encode proteins that are 
found to be frequently altered in bladder cancer, namely 
cytokeratins 17 and 20, annexins It and IV, and the fatty 
acid-binding proteins PA-FABP and FBP1. Four of these pro- 
teins were encoded by genes in chromosome 17q ( a fre- 
quently amplified chromosomal area in invasive bladder 
cancers. 

DISCUSSION 

Most human cancers have abnormal DNA content, having 
lost some chromosomal parts and gained others. The present 
study provides some evidence as to the effect of these gains 
and losses on gene expression in two pairs of non-invasive 
and invasive TCCs using high throughput expression arrays 
and proteomics, In combination with CGH. In general, the 
results showed that there is a clear individual regulation of the 
mRNA expression of single genes, which in some cases was 
superimposed by a DNA copy number effect. In most cases, 
genes located in chromosomal areas with gains often exhib- 
ited increased mRNA expression, whereas areas showing 
losses showed either no change or a reduced mRNA expres- 
sion. The latter might be because of the fact that losses most 
often are restricted to loss of one allele, and the cut-off point 
for detection of expression alterations was a 2-fold change, 
thus being at the border of detection. In several cases, how- 



Table II 



Proteins whose expression level correlates with both mRNA and gene dose changes 


Protein 


Chromosomal location 


Tumor TCC 


CGH alteration 


Transcript alteration* 


Protein alteration 


Annexin II 
Annexln IV 
Cytokeratin 17 
Cytokeratin 20 
(PA-)FABP 
FBP1 

Plasma gelsolin 
Heat shock protein 28 
Prohibitin 
Prolyl-4-hydroxyl 
hnRNPBI 


1q21 
2p13 

17q12-q21 

17q21.1 

8q21.2 

9q22 

9q31 

15q12-q13 
17q21 
17q25 
7p15 


733 

733 

827 

827 

827 

827 

827 

827 
827/733 
827/733 

827 


Gain 
Gain 
Gain 
Gain 
Loss 
Gain 
Gain 
Loss 
Gain 
Gain 
Loss 


Abs to Pres" 
3.9-Fold up 
3.8-Fold up 

5.6- Fold up 
10-Fold down 
2.3-Fold up 
Abs to Pres 
2.5-Fold up 

3.7- 72.5-FokJ up* 
5.7-/1 .6-Fold up 
2.5-Fold down 


Increase 

Increase 

Increase 

Increase 

Decrease 

Increase 

Increase 

Decrease 

Increase 

Increase 

Decrease 



>%l£ w^eThe'cSondiog aherations were found in both TCCs 827 and 733 these are shown as 827/733. 
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ever, an increase or decrease in DNA copy number was 
associated with de novo occurrence or complete loss of tran- 
script, respectively. Some of these transcripts could not be 
detected in the non-invasive tumor but were present at rela- 
tively high levels in areas with DNA amplifications in the inva- 
sive tumors (e.g. in TCC 733 transcript from cellular ligand of 
annexin It gene (chromosome 1q21) from absent to 2670 
arbitrary units; in TCC 827 transcript from small proline-rlch 
protein 1 gene (chromosome 1q12-q21.1) from absent to 
1326 arbitrary units). It may be anticipated from these data 
that significant clustering of genes with an increased expres- 
sion to a certain chromosomal area indicates an increased 
likelihood of gain of chromosomal material in this area. 

Considering the many possible regulatory mechanisms act- 
ing at the level of transcription, it seems striking that the gene 
dose effects were so clearly detectable in gained areas. One 
hypothetical explanation may lie in the loss of controlled 
# methylation in tumor cells (17-19). Thus, it may be possible 
that in chromosomes with increased DNA copy numbers two 
or more alleles could be demethylated simultaneously leading 
to a higher transcription level, whereas in chromosomes with 
losses the remaining allele could be partly methylated, turning 
off the process (20, 21). A recent report has documented a 
ploidy regulation of gene expression in yeast, but in this case all 
the genes were present in the same ratio (22), a situation that is 
not analogous to that of cancer cells, which show marked 
chromosomal aberrations, as well as gene dosage effects. 

Several CGH studies of bladder cancer have shown that 
some chromosomal aberrations are common at certain 
stages of disease progression, often occurring in more than 1 
of 3 tumors. In pTa tumors, these include 9p-, 9q-, 1q+, Y- 
(2, 6), and in pT1 tumors, 2q-,11p-, 11q~, 1q+. 5P+. 8q+. 
17q+, and 20q+ (2-4, 6, 7). The pTa tumors studied here 
showed similar aberrations such as 9p- and 9q22-q33- and 
9q- and Y-, respectively. Likewise, the two minimal invasive 
pT1 tumors showed aberrations that are commonly seen at 
that stage, and TCC 827 had a remarkable resemblance to the 
commonly seen pattern of losses and gains, such as 1q22-24 
amplification (seen in both tumors), 1 1q14-q22 loss, the latter 
often linked to 17 q+ (both tumors), and 1q+ and 9p-, often 
linked to 20q+ and 11 q13+ (both tumors) (7-9). Tbese ob- 
servations indicate that the pairs of tumors used in this study 
exhibit chromosomal changes observed in many tumors, and 
therefore the findings could be of general importance for 
bladder cancer. 

Considering that the mapping resolution of CGH is of about 
20 megabases it is only possible to get a crude picture of 
chromosomal instability using this technique. Occasionally, 
we observed reduced transcript levels close to or inside re- 
gions with increased copy numbers. Analysis of these regions 
by positioning heterozygous microsatellites as close as pos- 
sible to the locus showing reduced gene expression revealed 
loss of heterozygosity in several cases, it seems likely that 
multiple and different events occur along each chromosomal 



arm and that the use of cDNA microarrays for analysis of DNA 
copy number changes will reach a resolution that can resolve 
these changes, as has recently been proposed (2). The outlier 
data were not more frequent at the boundaries of the CGH 
aberrations. At present we do not know the mechanism be- 
hind chromosomal aneuploldy and cannot predict whether 
chromosomal gains will be transcribed to a larger extent than 
the two native alleles. A mechanism as genetic Imprinting has 
an impact on the expression level in normal cells and is often 
reduced in tumors. However, the relation between imprinting 
and gain of chromosomal material Is hot known. 

We regard it as a strength of this investigation that we were 
able to compare invasive tumors to benign tumors rather than 
to normal urothelium, as the tumors studied were biologically 
very close and probably may represent successive steps in 
the progression of bladder cancer. Despite the limited amount 
of fresh tissue available it was possible to apply three different 
state of the art methods. The observed correlation between 
DNA copy number and mRNA expression is remarkable when 
one considers that different pieces of the tumor biopsies were 
used for the different sets of experiments. This indicate that 
bladder tumors are relatively homogenous, a notion recently 
supported by CGH and LOH data that showed a remarkable 
similarity even between tumors and distant metastasis (10, 23). 

In the few cases analyzed, mRNA and protein levels 
showed a striking correspondence although in some cases 
we found discrepancies that may be attributed to translational 
regulation, post-translational processing, protein degrada- 
tion, or a combination of these. Some transcripts belong to 
undertranslated mRNA pools, which are associated with few 
translatlonally inactive ribosomes; these pools, however, 
seem to be rare (24). Protein degradation, for example, may 
be very important in the case of polypeptides with a short 
half-life (e.g. signaling proteins). A poor correlation between 
mRNA and protein levels was found in liver cells as deter- 
mined by arrays and 2D-PAGE (25), and a moderate correla- 
tion was recently reported by Ideker ef at. (26) in yeast. 
Mnterestingly, our study revealed a much better correlation 
between gained chromosomal areas and increased mRNA 
levels than between loss of chromosomal areas and reduced 
mRNA levels. In general, the level of CGH change determined 
the ability to detect a change in transcript) One possible 
explanation could be that by losing one allele the change in 
mRNA level is not so dramatic as compared with gain of 
material, which can be rather unlimited and may lead to a 
severalfold increase in gene copy number resulting in a much 
higher impact on transcript level. The latter would be much 
easier to detect on the expression arrays as the cut-off point 
was placed at a 2-fold level so as not to be biased by noise on 
the array. Construction of arrays with a better signal to noise 
ratio may in the future allow detection of lesser than 2-fold 
alterations in transcript levels, a feature that may facilitate the 
analysis of the effect of loss of chromosomal areas on tran- 
script levels. 
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In eleven cases we found a significant correlation between 
DNA copy number, mRNA expression, and protein level. Four 
of these proteins were encoded by genes located at a fre- 
quently amplified area in chromosome 17q. Whether DNA 
copy number is one of the mechanisms behind alteration of 
these eleven proteins is at present unknown and will have to 
be proved by other methods using a larger number of sam- 
ples. One factor making such studies complicated is the large 
extent of protein modification that occurs after translation, 
requiring immunoidentification and/or mass spectrometry to 
correctly identify the proteins in the gels. 

In conclusion, the results presented in this study exemplify 
the large body of knowledge that may be possible to gather in 
the future by combining state of the art techniques that follow 
the pathway from DNA to protein (26). Here, we used a tradi- 
tional chromosomal CGH method, but in the future high reso- 
lution CGH based on microarrays with many thousand radiation 
hybrid-mapped genes will Increase the resolution and informa- 
tion derived from these types of experiments (2). Combined with 
expression arrays analyzing transcripts derived from genes with 
known locations, and 2D gel analysis to obtain information at 
the post-translational level, a dearer and more developed un- 
derstanding of the tumor genome will be forthcoming. 
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ABSTRACT 

Genetic changes underlie tumor progression and may lead to cancer- 
specific expression of critical genes. Over 1100 publications have de- 
scribed the use of comparative genomic hybridization (CGH) to analyze 
the patter o of copy number alterations in cancer, but very few of the genes 
affected are known. Here, we performed high-resolution CGH analysis on 
cDNA microarrays in breast cancer and directly compared copy number ^ 
and mRNA expression levels of 13^24 genes to quantitate the Impact of 
genomic changes on gene expression. We Identified and mapped the 
boundaries of 24 independent amplicons, ranging in size from 0.2 to 32 
Mb. Throughout the genome, both high- and low-level copy number 
changes had a substantial impact on gene expression, with 44% of the 
highly amplified genes showing overexpression and 105% of the highly 
overexpressed genes being amplified. Statistical analysis with random 
permutation tests identified 270 genes whose expression levelt across 14 
samples were systematically attributable to gene amplification. These 
included most previously described amplified genes in breast cancer and 
many novel targets for genomic alterations, including the HOXB7 gene, 
the presence of which in a novel amplicon at 17q2U was validated in 
10.2% of primary breast cancers and associated with poor patient prog- 
nosis. In conclusion, CGH on cDNA microarrays revealed hundreds of 
novel genes whose overexpression is attributable to gene amplification. 
These genes may provide insights to the clonal evolution and progression 
of breast cancer and highlight promising therapeutic targets. 

INTRODUCTION 

Gene expression patterns revealed by cDNA microarrays have 
facilitated classification of cancers into biologically distinct catego- 
ries, some of which may explain the clinical behavior of the tumors 
(1-6). Despite this progress in diagnostic classification, the molecular 
mechanisms underlying gene expression patterns in cancer have re- 
mained elusive, and the utility of gene expression profiling tn the 
identification of specific therapeutic targets remains limited 

Accumulation of genetic defects is thought to underlie the clonal 
evolution of cancer. Identification of the genes that mediate the effects 
of genetic changes may be important by highlighting transcripts mat 
are actively involved in rumor progression. Such transcripts and their 
encoded proteins would be ideal targets for anticancer therapies, as 
demonstrated by the clinical success of new therapies against ampli- 
fied oncogenes, such as ERBB2 and EGFR (7, 8), in breast cancer and 
other solid tumors. Besides amplifications of known oncogenes, over 
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Expression ratio 



Fig i. Impact of gene copy number on global gene expression levels. A percentage of 
over- arri^rexprcsscd genes <r axis) according to copy nurnbe, 
Threshold values used for over- and undercxpression were >2.1W (global upper 7% of 
the cDNA ratios) and <0.4826 (global lower 7% oT the expresston^os^B. percentage 
of amplified and deleted genes according to expression ratios. Threshold values for 
amplification and deletion were > 1 ,5 and <0.7. 



20 recurrent regions of DNA amplification have been mapped in 
breast cancer by CGH 5 (9, 10). However, these amplicons are often 
large and poorly defined, and their impact on gene expression remains 

unknown. r , 

We hypothesized that genome-wide identification of those gene 
expression changes that are attributable to underlying gene copy 
number alterations would highlight transcripts that are actively in- 
volved m the causation or maintenance of the malignant phenotype. 
To identify such transcripts, we applied a combination of cDNA and 
CGH microarrays to: 00 determine the global impact that gene copy 
number variation plays in breast cancer development and progression; 
and (b) identify and characterize those genes whose mRNA expres- 



* The abbreviations used are: CGH, comparative genomic hybridization; FISH, fluo- 
rescence in situ hybridization; RT-PCR, reverse transcripUon-PCR. 
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Fig. 2. Genomc-wide copy number and expression ^ in 
tf««) Jcross the entire genome from 1 P telomere to Xq telome« » s ^.^7^^^ ratios were plotted » a function of the position 

and green If* a ratio of 1.2. B-C genome-wide copy number '^^^^^f^^^^^ of 10 adjacent cUmcsb shown. Kerf horizontal line, the 

indicated whh a dashed tine. 



sion is most significantly associated with amplification of the corre- 
sponding genomic template. 

MATERIALS AND METHODS 

Breast Cancer Cell lines. Fourteen breast cancer cell lines (BT-20, BT- 
474, HCC1428, Hs578t, MOT, MDA-361, MDA-436, MDA-453, MDA-468, 
SKBR-3, T-47D, UACC812, ZR-75-1, and ZR-75-30) were obtained from the 
American Type Culture Collection (Manassas, VA). Cells were grown under 
recommended culture conditions. Genomic DNA and mRNA were isolated 
using standard protocols. 

Copy Number and Expression Analyses by cDNA Microarrays. The 
preparation and printing of the 13,824 cDNA clones on glass slides were 
performed as described (1 1-13). Of these clones, 244 represented uncharac- 
terized expressed sequence tags, and the remainder corresponded to known 
genes. CGH experiments on cDNA microarrays were done as described (14, 
15). Briefly, 20 *tg of genomic DNA from breast cancer cell lines and normal 
human WBCs were digested for 14-18 h with AM and teal (Life Technol- 
ogies, Inc., Roclcville, MD) and purified by phenoVchloroform extraction. Six 
HZ of digested cell line DNAs were labeled with Cy3-dUTP (Amersham 
Pharmacia) and normal DNA with Cy5-dUTP (Amersham Pharmacia) using 
the Bioprime Labeling kit (Life Technologies, Inc.). Hybridization (14, 1 5) and 
posthybridization washes (13) were done as described. For the expression 
analyses, a standard reference (Universal Human Reference RNA; Stratagene, 
La Jolla; CA) was used in all experiments. Forty *ig of reference RNA were 
labeled with Cy3-dUTP and 3.5 **g of test mRNA with Cy5-dUTP, and the 
labeled cDNAs were hybridized on microarrays as described (13, 15). For both 
microarray analyses, a laser confocal scanner (Agilent Technologies, Palo 
Alto, CA) was used to measure the fluorescence intensities at the target 
locations using the DEARRAY software (16). After background subtraction, 
average intensities at each clone in the test hybridization were divided by the 
average intensity of the corresponding clone in the control hybridization. For 
the copy number analysis, the ratios were normalized on the basis of the 
distribution of ratios of all targets on the array and for the expression analysis 
on the basis of 88 housekeeping genes, which were spotted four times onto the 
array. Low quality measurements (i.e.. copy number data with mean reference 
intensity <100 fluorescent units, and expression data with both test and 
reference intensity <100 fluorescent units and/or with spot size <50 units) 



were excluded from the analysis and were treated as missing values. The 
distributions of fluorescence ratios were used to define cutpoints for increased/ 
decreased copy number. Genes with CGH ratio >K43 (representing the upper 
5% of the CGH ratios across all experiments) were considered to be amplified, 
and genes with ratio <0.73 (representing the lower 5%) were considered to be 
deleted. 

Statistical Analysis of CGH and cDNA Microarray Data. To evaluate 
the influence of copy number alterations on gene expression, we applied the 
following statistical approach. CGH and cDNA calibrated intensity ratios were 
log-transformed and normalized using median centering of the values m each 
cell line. Furthermore, cDN A ratios for each gene across all 14 cell lines were 
median centered. For each gene, the CGH data were represented by a vector 
that was labeled 1 for amplification (ratio, >1.43) and 0 for no amplification. 
Amplification was correlated with gene expression using the signal-tc-noise 
statistics (1). We calculated a weight, w# for each gene as follows: 

» "ai + ff *o 

where «„. <r gl and 0*, denote the means and SDs for the expression 
levels for amplified and nonaroplified cell lines, respectively. To assess the 
statistical significance of each weight, we performed 10,000 random permu- 
tations of the label vector. The probability that a gene had a larger or equal 
weight by random permutation than the original weight was denoted by a. A 
low a (<0.05) indicates a strong association between gene expression and 
amplification. 

Genomic Localization of cDNA Clones and Amplicon Mapping. Each 
cDNA clone on the microarray was assigned to a Unigene cluster using the 
Unigene Build 141 * A database of genomic sequence alignment informaUon 
for mRNA sequences was created from the August 2001 freeze of the Uni- 
versity of California Santa Crazes GoldenPath database. 7 The chromosome and 
bp positions for each cDNA clone were then retrieved by relating these data 
sets Amplicons were defined as a CGH copy number ratio >2.0 in at least two 
adjacent clones in two or more cell lines or a CGH ratio >2.0 in at least three 
adjacent clones in a single cell line. The amplicon start and end positions were 



* Internet address: bttp*y/re$earcb.r*^ 
7 Internet address: www.gcnome.ucsc.cdtt. 
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Table I Summary independent amplicons in 14 breast cancer cell tines by 
CGH microarray 



Location 

lpl3 
Iq21 
lq22 
3pl4 

7pl2.l-7pll.2 

7q3t 

7q32 

8421.1 l-8q2l.13 
8q213 

8q233-«q24.]4 

8q24.22 

9pl3 

13q22-q31 

16q22 

17qll 

I7ql2-q21.2 

I7q2l.32-q2l.33 

l7qf22-423.3 

17q23.3-q24.3 

I9ql3 

20qll.22 

20ql3.12 

20ql3.l2-ql3.l3 

20ql3.2-«ql3.32 



Start (Mb) 


End (Mb) 


size v™ D 7 


132.79 


132.94 


0.2 


17352 


177.25 


3J 


179.28 


179.57 


0.3 


71.94 


74.66 


2.7 


55.62 


60.95 


5.3 


125.73 


130.96 


5.2 


140,01 


140.68 


0.7 


86.45 


92.46 


6.0 


98.45 


103.05 


4.6 


129.88 


142.15 


12J 


151.21 


152.16 


1.0 


38.65 


39.25 


0.6 


77 J 5 


8138 


4.2 


86.70 


87.62 


0.9 


2930 


30.85 


1.6 


39.79 


42.80 


3.0 


52.47 


55.80 


3J 


63.81 


69.70 


5.9 


69.93 


74.99 


5.1 


40,63 


41.40 


0.8 


34.59 


35.85 


1.3 


44.00 


45.62 


1.6 


46.45 


49.43 


3.0 


5U2 


59.12 


7.8 



GENE EXPRESSION PATTERNS IN BREAST CANCER. 

CGH were validated, with lq2l, 17ql2-q2U, 17q22-q23 t 20ql3.1, 
and 20ql3.2 regions being most commonly amplified. Furthermore, 
the boundaries of these amplicons were precisely delineated. In ad- 
dition, novel amplicons were identified at 9pl3 (38.65-39.25 Mb), 
and 17q213 (52:47-55.80 Mb). 

Direct Identification of Putative Amplification Target Genes, 
The cDNA/CGH microarray technique enables the direct correla- 
tion of copy number and expression data on a gene-by-gene basis 
throughout the genome. We directly annotated hi^-resolution 
CGH plots with gene expression data using color coding. Fig. 2C 
shows that most of the amplified genes in the MCF-7 breast cancer 
celt line at lp!3, 17q22-q23, and 20ql3 were highly overex- 
pressed. A view of chromosome 7 in the MDA-468 cell line 
implicates EGFR as the most highly overexpressed and amplified 
gene at 7pl l-pl2 (Fig. 3A). In BT-474, the two known amplicons 
at 17ql2 and 17q22-q23 contained numerous highly overex- 
pressed genes (Fig. 3B). In addition, several genes, including the 
homeobox genes HOXB2 and HOXB7, were highly amplified m a 
previously undescribed independent amplicon at 17q21.3. HOXB7 
was systematically amplified (as validated by FISH, Fig. 3B. inset) 
as well as overexpressed (as verified by RT-PCR, data not shown) 
in BT-474, UACC812, and ZR-75-30 cells. Furthermore, this novel 



extended to include neighboring nonamplifjed clones (ratio, <t.5) ; The am- 
plicon size detenmnation was partially dependent on local clone density. 

FISH. Dual-color interphase FISH to breast cancer cell lines was done as 
described (17). Bacterial artificial chromosome clone RP11-361K8 was la- 
beled with SpectrumOrange (Vysis, Downers Grove, \L\ and Spectrum- 
Orange-labeled probe for EGFR was obtained from Vysis. SpectnimGreen- 
labeled chromosome 7 and 17 centromere probes (Vysis) were used as a 
reference. A tissue microarray containing 612 formalm-fixed, paraffin-embed- 
ded primary breast cancers (17) was applied in FISH analyses as Described 
(18). The use of these specimens was approved by the Ethics Committee of the 
University of Basel and by the NIH. Specimens containing a 2-fold or higher 
increase in the number of test probe signals, as compared with corresponding 
centromere signals, in at least 10% of the tumor ceils were considered to be 
amplified. Survival analysis was performed using the Kaplan-Meier method 
and the log-rank test. 

RT-PCR. The HOXB7 expression level was determined relative to 
GAPDH. Reverse transcription and PCR amplification were performed using 
Access RT-PCR System (Promega Corp., Madison, Wl) with 10 ng of mRNA 
as a template. HOXB7 primers were 5'-GAGCAGAGGGACTCGGACTT-3 
and 5'-GCGTCAGGTAGCG ATTGTAG-3 ' . 



RESULTS 

Global Effect of Copy Number on Gene Expression. 13,824 
arrayed cDNA clones were applied for analysis of gene expression 
and gene copy number (CGH microarrays) in 14 breast cancer cell 
lines. The results illustrate a considerable influence of copy number 
on gene expression patterns. Up to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (Le., belonged to 
the global upper 7% of expression ratios), compared with only 6% for 
genes with normal copy number levels (Fig. 1 A). Conversely, 10.5% 
of the transcripts with high-level expression (cDNA ratio, >10) 
showed increased copy number (Fig. IB). Low-level copy number 
increases and decreases were also associated with similar, although 
less dramatic, outcomes on gene expression (Fig. 1). 

Identification of Distinct Breast Cancer Amplicons. Base-pair 
locations obtained for 1 1,994 cDNAs (86.8%) were used to plot copy 
number changes as a function of genomic position (Fig. 2, Supple- 
ment Fig. A). The average spacing of clones throughout the genome 
was 267 kb. This high-resolution mapping identified 24 independent 
breast cancer amplicons, spanning from 0.2 to 12 Mb of DNA (Table 
1) Several amplification sites detected previously by chromosomal 
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Fig. 4. List or 50 genet with a statistically 
significant correlation (a value <Q.05) between 
gene copy number and gene expression. Name, 
chr<m>osomal location, and the a value for each 
gene are indicated. The genes have been ordered 
according to their position in the genome. The color 
maps on the right illustrate the copy number and 
expression ratio patterns in the 14 cell tines. The 
key to the color code is shown at the bottom of the 
graph. Gray squares, missing values. The complete 
list of 270 genes is shown in supplemental Fig. B. 



GENE EXPRESSION PATTERNS IN BREAST CANCER 



Copy nymbtr raOo 

HHiiisiilli! 

' :::: rSSSBSSSP™ 



JOAAMBSpnMA 

rata OT T- ta aut—fcv profrh 



5q11 
TpU 
7pt1 




OJ03Z 
OJB11 

aoto 

- 0JD06 
OJMO 



<»ti lllliliillllM 
aoi2 ■■■■■BXBMHn 

^ «liili!*l*§ii 

OJOtl _ 



CJB& 

aoio 

iiimiiiuiii 



MUtti pea** 



nS«N 



ID 



RAPZ50 



»(DMA)t 
MY8L2 



*ne _v«rpro**n_rB 
XIAA0443 gmpBduc* 



IflQZJ 
t7qi2 
Uqt* 
I7q_l 

W» 
TT<fZ\ 

1W 

T*02 
TKjES 
17<£3 

_xr»t 

30QU 
20q« 

eoqra 

20ota 
20Q13 

aoqis 
20q13 

22ql2 



IIIllllllOlll 

iiiiinniuu m 



iimiiniiin a^^. 
«■«**■■■■■■■■■ 
uiBiiiniiiii . SSSfi 
niiiimniii -!I!Hk«m« 
iiMimiiiMiil H!!!!HH 



cucns 

0.003 
O0t2 
OJ0Q7 
WXH 

anas 

0403 

aoii 
caw 
qjozi 

0019 
OjD0» 

aaoa 
ooto 

ooci 

OJB32 
O001 



OWt 
0.010 
Wttl 

OjWO 

oxm> ilimill 





iiiimii 

.9 MIIBMWBWE- ______ _™ ^^——^--^ 

*1*B8HB*SMU ■■■!■*§! 




amplification was validated to be present in 10.2% of 363 primary 
breast cancers by FISH to a tissue microarray and was associated 
with poor prognosis of the patients {P = 0.001). 

Statistical Identification and Characterization of 270 Highly 
Expressed Genes in Amplicons. Statistical comparison of expres- 
sion levels of all genes as a function of gene amplification identified 
270 genes whose expression was significantly influenced by copy 
number across all 14 cell lines (Fig. 4, Supplemental Fig. B). Accord- 
ing to the gene ontology data, 8 91 of the 270 genes represented 
hypothetical proteins or genes with no functional annotation, whereas 
179 had associated functional information available. Of these, 151 
(84%) are implicated in apoptosis, cell proliferation, signal transduc- 
tion, and transcription, whereas 28 (16%) had functional annotations 
that could not be directly linked with cancer. 



DISCUSSION 

The importance of recurrent gene and chromosome copy number 
changes in the development and progression of solid tumors has been 
characterized in > 1000 publications applying CGH 9 (9 ( 10), as well 
as in a large number of other molecular cytogenetic, cytogenetic, and 
molecular genetic studies. The effects of these somatic genetic 
changes on gene expression levels have remained largely unknown, 
although a few studies have explored gene expression changes occur- 
ring in specific amplicons (15, 19-21). Here, we applied genome- 
wide cDNA microarrays to identify transcripts whose expression 
changes were attributable to underlying gene copy number alterations 
in breast cancer. 

The overall impact of copy number on gene expression patterns was 
substantial with the most dramatic effects seen in the case of high- 



* Internet address: http://www.gcneontology.org/. 



9 Internet address: http://www.DcfeLnlra.nih.gov/enititt 
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level copy number increase. Low-level copy number gains and losses 
also had a significant influence on expression levels of genes in the 
regions affected, but these effects were more subtle on a gene-by-gene 
basis than those of high-level amplifications. However, the impact of 
low-level gains on the dysregulation of gene expression patterns in 
cancer may be equally important if not more important than that of 
high-level amplifications. Aneuploidy and low-level gains and losses 
of chromosomal arms represent the most common types of genetic 
alterations in breast and other cancers and, therefore, have an influ- 
ence on many-genes. Our results in breast cancer extend the recent 
studies on the impact of aneuploidy on global gene expression pat- 
terns in yeast cells, acute myeloid leukemia, and a prostate cancer 
model system (22-24). 

The CGH microarray analysis identified 24 independent breast 
cancer amplicons. We defined the precise boundaries for many am- 
plicons detected previously by chromosomal CGH (9, 10, 25, 26) and 
also discovered novel amplicons that had not been detected previ- 
ously, presumably because of their small size (only 1-2 Mb) or close 
proximity to other larger amplicons. One of these novel amplicons 
involved the homeobox gene region at 17q21.3 and led to the over- 
expression of the HOXB7 and HOXB2 genes. The homeodomain 
transcription factors are known to be key regulators of embryonic 
development and have been occasionally reported to undergo aberrant 
expression in cancer (27," 28). HOXB7 transfection induced cell pro- 
liferation in melanoma, breast, and ovarian cancer cells and increased 
tumorigenicity and angiogenesis in breast cancer (29-32). The pres- 
ent results imply that gene amplification may be a prominent mech- 
anism for oyercxpressing HOXB7 in breast cancer and suggest that 
HOXB7 contributes to tumor progression and confers an aggressive 
disease phenotype in breast cancer. This view is supported by our 
finding of amplification of HOXB7 m 10% of 363 primary breast 
cancers, as well as an association of amplification with poor prognosis 
of the patients. 

We carried out a systematic search to identify genes whose 
expression levels across all 14 cell lines were attributable to 
amplification status. Statistical analysis revealed 270 such genes 
(representing -2% of all genes on the array), including not only 
previously described amplified genes, such as HER-2, MYC> 
EGFRy ribosomal protein s6 kinase* and AIB3, but also numerous 
novel genes such as NRAS-related gene (lpl3), syndecan-2 (8q22), 
and bone morphogenic protein (20ql3.1), whose activation by 
amplification may similarly promote breast cancer progression. 
Most of the 270 genes have not been implicated previously in 
breast cancer development and suggest novel pathogenetic mech- 
anisms. Although we would not expect all of them to be causally 
involved, it is intriguing that 84% of the genes with associated 
functional information were implicated in apoptosis, cell prolifer- 
ation, signal transduction, transcription, or other cellular processes 
that could directly imply a possible role in cancer progression. 
Therefore, a detailed characterization of these genes may provide 
biological insights to breast cancer progression and might lead to 
the development of novel therapeutic strategies. 

In summary, we. demonstrate application of cDNA microarrays 
to the analysis of both copy number and expression levels of over 
12,000 transcripts throughout the breast cancer genome, roughly 
once every 267 kb. This analysis provided: (a) evidence of a 
■ prominent global influence of copy number changes on gene 
expression levels; (b) a high-resolution map of 24 independent 
amplicons in breast cancer; and (c) identification of a set of 270 
genes, the overexpression of which was statistically attributable to 
gene amplification. Characterization of a novel amplicon at 
. 17q2K3 implicated amplification and overexpression of the 
HOXB7 gene in breast cancer, including a clinical association 
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between HOXB7 amplification and poor patient prognosis. Overall, 
our results illustrate how the identification of genes activated by 
gene amplification provides a powerful approach to highlight 
genes with an important role in cancer as well as to prioritize and 
validate putative targets for therapy development. 



REFERENCES 

1 . Golub, T. R., Slooim, D. K., Tamayo. K Huard, C, Gaascnbcck. M, Mcsirov, J. P, 
Colter,' H., Loh, M, L„ Downing, J. Caligiuri, M A, Bloomfidd, C P ? and 
Lander, E. S. Molecular classification of cancer, eta discovery and class prediction 
by gene expression monitoring. Science (Wash. DQ. 286: 531-537, 1999. 

2. Alfeadeh, A. A., Eiscn, M. B„ Davis, R. E., Ma, C Losses, I. S^Rosen wald. A., 
BoWrick, J. C, Sabet, H., Tnm, T., Yu, X„ el al Distinct types of diffuse large B^dl 
lymphoma identified by gene expression profiling. Nature (Lond.), 403: 503-511, 

3. Bfctacr, M., Mcltzer, P„ Chen, Y„ Jiang, Y„ Scftor, E.. Hendrix. KL, Radmacner, M., 
Simon, R_, YakhinVZ,, Ben-Dor, A„ ef al Molecular classification of cutaneous 
malignant melanoma by gene expression profiling. Nature (Lend), 406: 536-540, 
2000. 

4. Perou, C. Sortie, T„ Eiscn, M. B, van de Rijn, Jeffrey, S. S.. Rees, G A, 
Pollack, J. R-, Ross, D. T„ Johnsen, R, Akslen, L.A^etaL Molecular portraits of 
human breast tumours. Nature (LoadJ, 406: 747-752, 2000, 

5 Dhanasckaran, S. M^ Barrcttc, T. R.. Ghosh, D.» Shah, Varambally, Kiiracht 
IC, Pienta, K. U Rubin, M, A., and Chinnaiyan, A. M. Ddincmon of prognostic 
biomarkers in prostate cancer. Nature (Lond), 412: 822-826. 2001. ^ 

6. Scriie, Pcrou, C. M, Tibshirxni, R. Aas, T., Gdsler, Johnsen, H-, Hastte^T. 
Eiscn, M, Bi, van de Rijn, M.. Jeffrey, S. S^cfaL Gene expression patterns of orca^ 
carcinomas distinguish tumor subclasses with clinical implications. Proa NatL Acad. 
Set USA, 98: 10869-10874, 2001. . 

7. Ross, J, S., and Fletcher, J. A. The HER-2/neu oncogene: prognostic factor, predictive 
factor and target for therapy. Semin. Cancer Biol, 9: 125-138, 1999. 

8. Arteaga, C. L. The epidermal growth factor receptor from mutant c«cogene m 
nonhuman cancers to therapeutic target in human neoplasia. J. Clin. Oncol, 19; 

9. Kmuutila, Bjorkovist, A. M„ Autio, 1L, Taikkanen, M., Wolf, M^ Monni, O., 
Szvmanska, J., Urramendy, M. U Tapper. J. Per*. R. EMifti, W„ et ci DNA copy 
number amplifications in human neoplasms: review of comparative genomic hyDno- 
ization studies. Am J. Pathol., 1S2: 1107-1123, 1998. 

10. Kmtutila S., Autio JC, and Aalto Y. Online access to CGH data of DNA sequence 
copy number changes. Am- J. PathoU 157: 689, 2000. 

1 1 DeRisi. J„ Pcntand, L., Brown. P. O., Bittoer, M. U Mettzer, P. Ray, M-, Chen, 
Y., Su, Y. A, and Trent, J. M. Use of a C0NA microarray to analyse gene expression 
patterns in human cancer. Nat Genet, 14: 457-460, 1996. 
12. Shalon, Smith, S. J., and Brown, P. O. A DNA microarray system for aralyang 
complex DNA samples using two-color fluorescent probe hybridization. Genome 
Res., 6: 639-645, 1996. ' - 

Mousses, S.. Bittncr, M. U Chen, Dougherty, E. R. Baxevams, A.. Meter, P. S., 
and Trent, J. M Gene expression analysis by cDNA microarrays. In: F. J. Livesey and 
S.P. Hunt (eds.X Functional Genomics, pp. 113-137. Oxford: Oxford University 
Press, 2000. L . L ' 

Pollack. J. R^ Perou. C M., Alizadeh, A. A„ Eiscn, M. B., Perganienscmkov, A^ 
WiUiams, C Jeffrey. S. Botstein, D., and Brown, P. O- Genome-wide analysis 
of DNA copy-number changes using cDNA nucroarrays. Nat. Genet, 23: 41-46. 
1999. 

Monni. O,. Barlund, M„ Mousses, Kononen, J^ Sauter, G, He tskancn , M., 
Paavola, P., A vela, 1C, Chen. Y., Bmncr, M. U and KaUiomerni, A. Comprehensive 
copy number and gene expression profiUng of the I7q23 amplicon in human breast 
cancer. Proc. Natl Acad Set USA. 98: 5711-5716, 2001. 
16. Chat Y„ Dougherty. E. R„ and Bittner. M. L. Ratio^ased (decisions *dA fte 
quanritative analysis of cDNA microarray images. J. Biomed. Optics. 2: 364-374. 
1997. 

Barlund, M., Forozan. F M Kononen, J. Bubendorf. U Chen, Y, Bittner, M. L., 
Torhorst J., Haas, P^ Bucher, C, Sauter. G.. rr at Detecting activatioo of nbosomal 
protein S6 kinase by complementary DNA and tissue microarray analysis. J. Natl. 
Cancer Inst, 92: 1252-1259,2000. . . 

Andersen, C. U Hostetter, G., Grigoryan, A., Sauter, G, and Kallmmenu, A. 
Improved procedure for nuoresccncc in situ hybridization on tissue microarrays. 
Cytometry. 45: 83-86. 2001. _ B ._ . 

Kauraniemi, P., Barlund. Monni. O, and KaDionknu, A. New amphfted and 
highly expressed genes disco>wd in the ERBB2 amplicon in breast cancer by cDNA 
microarrays. Cancer Res^d/; 8235-8240. 2001. 

20 Clark, J., Edwards, S. p John, Flohr, Pn Gordon, T„ Maillard. Giddmgs, L, 
Brown, C. Bagberzadeh, A. CampbeU, C Shipley, J. Woostei, snd Cooper. 
C S. Idcnrificatton of amplified and expressed genes in breast cancer by cornparative 
hybridization onto microarrays of randomly selected cDNA clones. Genes Chromo- 
somes Cancer, 34: 104-114, 2002. 

2 1 Vans, A, WolC M. t Monni. O.. Vakkaxi M. U Kokkola. A., Moskaluk, C, Fnerson, 
H.. Powell, S. Knuutila, S., Kallioniemi, A^ and El-Rifai, W. Tai^ete or gew 
amplification and overexpression at I7q in gastric cancer. Cancer Res^ 62: 262$- 
2629 2002 

22. Hughes. T." IU Roberts. C J.. Dai. Jones. A. R.. Meyer. M R. Slade D 
Burchard, J.. Dow, S.. Ward, T. R., Kidd, M. J., Friend. S. H., and Marion M. J. 



13. 



15. 



17. 



18. 



19. 



GENE EXPRESSION PATTERNS IN BREAST CANCER 



Widespread ancuploidy revealed by DNA microanay expression profiling- Nat 
Genet, 25: 333-337, 2000. 

23. Vinancva. K M Wright, F. A.. Tanner, S. M.. Yuan, B., Lemon, W, J„ Cahgnin, M. A., 
Bloomfield, C. D., de La Cbapclle, and Krahe, R. Expression profiling reveals 
fundamental biological differences in acute myeloid leukemia with isolated trisomy 8 
and normal cytogenetics. Proc, Natl. Acad. Sci USA, 98: 1124-1129, 2001. 

24. Phillips, J. L, Hayward, S. W, Wang. Y., Vasselti, J. Pavlovich, C, Padilla^ash, 
H.. Pczullo, J. JL, Ghadimi, & Grossfeld, G. D., Rivera, A. Linchan, W. M„ 
Cunha, G. JL, and Ricd. T. The consequences of chromosornal ancuploidy on gene 
expression profiles in a cell line model for prostate carcinogenesis. Cancer Res., 61: 

. 8143-8149,2001. . , " ' 

25 BSrtund, M, Ttrfckonen. M„ Forozan, F„ Tanner, M. KL, Kalllonienu, O. P., and 
Kallimuemi, A. Increased copy number at 17q22-q24 by CGH in breast cancer is due 
to high-level amplification of two separate regions. Genes Chromosomes Cancer, 20: 
372-376, 1997. . ^ ' n . ■ 

26. Tanner, M. M., Tirkkonen. M., Kallioniemi, A., Isofa, J., Kuiikasjarvi, T., ColUns, C 
Kowbel, D.. Guan. X. Y., Trent. J. Gray, J. W., Mcltzer, P. and Kallioniemi O. P: 
Independent amplification and frequent co-amplification of three nonsynttnic regions 



on the long arm of chromosome 20 in human breast cancer. Cancer Res^ 56: 
3441—3445 J 996. 

27l Cillo, C, Faiella, A^ Cantite, M„ and BoncineUi, E. Homeobox genes and cancer. 
Exp. CcU Res. 248: 1-9, 1999. . 

28. Cillo, C, Candle, M.. Faiella, A., and Boncinctli, E Hocncobox genes in normal and 
malignant cells. J. Cell. PhysioU 188: 161-169, 2001. 

29. Care, A^ Silvani, A., Meccia, Mattia, C Stoppacciaro, A-. Permtam, 0, Peschlc 
C, and Colombo, M. P. HOXB7 conshtutively activates basic fibroblast growth 
factor in melanomas. Mol. Cell. BioL, 16: 4842-4851, 1996. 

30. Care, A, Silvani, A, Meccia, E., Mattia, G, Peschlc C and Colombo. M P. 
Transduction of the SkBr3 breast carcinoma ccU line with the HOXB7 gene induces 
bFGF expression, increases ceU proKferation and reduces growth factor dependence. 
Oncogene, 76:3285-3289, 1998. . * 

31. Care, A, Felicctti. F„ Meccia, Bottero, U Parenza, M., Stoppacoaro, A., Peschlc, 
C, and Colombo, M. P. HO*B7: a key factor for tumor-associated angiogenic switch. 
Cancer Res., 61: 6532-6539, 2001. 

32. Naora. H., Yang, Y. Q„ Montz, F. Seidman, J. D„ Kurman, R. U and Roden, R. B. 
A serologically identified tumor antigen encoded by a hoineobox gene promotes 
growth of ovarian epithelial cells. Proc NatL Acad. Sci. USA, 98: 4060-4065, 2001. 



6245 



Microarray analysis reveals a major direct role of 
DNA copy number alteration in the transcriptional 
program of human breast tumors 

Jonathan R. Pollack-". Therese Sortie*. Charles M. Perou', Christian A. Ree4". Stefanje S. Jeffrey" Per E. Loaning" 
Robert Tlbshlranis*. Dav ld Botstein". Anne-Use Borresen-Dale*. and Patrick O. Brown"™ 

Comprehensive Cancer Center. University of North Carolina, Chapel H»H, NC 27599 



Contributed by Patrick O. Brown. August 6, 2002 

Genomic DNA copy number alterations are key genetic events in 
the development and progression of human cancers. Here we 
report a genome-wide microarray comparative genomic hybrid- 
ization (array CGH) analysis of DNA copy number variation in 
a series of primary human breast tumors. We have profiled DNA 
copy number alteration across 6,691 mapped human genes, in 44 
predominantly advanced, primary breast tumors and 10 breast 
cancer cell lines. While the overall patterns of DNA amplification 
and deletion corroborate previous cytogenetic studies, the high- 
resolution (gene-by-gene) mapping of amplkon boundaries and 
the quantitative analysis of amplicon shape provide significant 
improvement in the localization of candidate oncogenes. Parallel 
microarray measurements of mRNA levels reveal the remarkable 
degree to which variation in gene copy number contributes to 
variation in gene expression in tumor cells. Specifically, we find 
that 62% of highly amplified genes show moderately or highly 
elevated expression, that DNA copy number influences gene ex- 
pression across a wide range of DNA copy number alterations 
(deletion, low-, mid- and high-level amplification), that on average, 
a 2-fold change in DNA copy number Is associated with a corre- 
sponding 1 .5-fold change in mRNA levels, and that overall, at least 
12% of all the variation in gene expression among the breast 
tumors is directly attributable to underlying variation in gene copy 
number. These findings provide evidence that widespread DNA 
copy number alteration can lead directly to global deregulation of 
gene expression, which may contribute to the development or 
progression of cancer. 

Conventional cytogenetic techniques, including comparative 
genomic hybridization (CGH) (1), have led to the identifi- 
cation of a number of recurrent regions of DNA copy number 
alteration in breast cancer cell lines and tumors (2-4). While 
some of these regions contain known or candidate oncogenes 
{e.g., FGFR1 (8pll), MYC (8q24). CCND1 (Uql3), ERBB2 
(17ql2), and ZNF217 (20ql3)J and tumor suppressor genes 
[RBi (13q14) and TP53 (I7pl3)], the relevant gene(s) within 
other regions (e.g., gain of lq> 8q22, and 17q22-24, and loss of 
8p) remain to be identified. A high-resolution genome-wide 
map, delineating the boundaries of DNA copy number alter- 
ations in tumors, should facilitate the localization and identifi- 
cation of oncogenes and tumor suppressor genes in breast 
cancer In this study, we have created such a map, using 
array-based CGH (5-7) to profile DNA copy number alteration 
in a series of breast cancer cell lines and primary tumors. 

An unresolved question is the extent to which the widespread 
DNA copy number changes that we and others have identified 
in breast tumors alter expression of genes within involved 
regions. Because we had measured mRNA levels in parallel in 
the same samples (8), using the same DNA microarrays, we had 
an opportunity to explore on a genomic scale the relationship 
between DNA copy number changes and gene expression. From 



this analysis, we have identified a significant impact of wide- 
spread DNA copy number alteration on the transcriptional 
programs of breast tumors. 

Materials and Methods 

Tumors and Cell Lines. Primary breast rumors were predominantly 
large (>3 cm), intermediate-grade, infiltrating ductal carcino- 
mas, with more than 50% being lymph node positive. The. 
fraction of tumor cells within specimens averaged at least 50%. 
Details of individual tumors have been published (8, 9), and 
are summarized in Table 1, which is published as supporting 
information on the PNAS web site, www.pnas.org. Breast cancer 
cell lines were obtained from the American Type Culture 
Collection. Genomic DNA was isolated either using Qiagen 
genomic DNA columns, or by phenol/chloroform extraction 
followed by ethanol precipitation. 

DNA Labeling and Microarray Hybridizations. Genomic DNA label- 
ing and hybridizations were performed essentially as described 
in Pollack et aL (7), with slight modifications. Two micrograms 
of DNA was labeled in a total volume of 50 microliters and the 
volumes of all reagents were adjusted accordingly. 'Test" DNA 
(from tumors and cell lines) was fluorescently labeled (CyS) and 
hybridized to a human cDNA microarray containing 6,691 
different mapped human genes (i.e., UniGene clusters). The 
"reference" (labeled with Cy3) for each hybridization was nor- 
mal female leukocyte DNA from a single donor. The fabrication 
of cDNA microarrays and the labeling and hybridization of 
mRNA samples have been described (8). 

Data Analysis and Map Positions. Hybridized arrays were scanned 
on a GenePix scanner (Axon Instruments, Foster City, CA), and 
fluorescence ratios (test/reference) calculated using scan alvze 
software (available at http://rana.Ibl.gov). Fluorescence ratios 
were normalized for each array by setting the average log 
fluorescence ratio for all array elements equal to 0. Measure- 
ments with fluorescence intensities more than 20% above back- 
ground were considered reliable. DNA copy number profiles 
that deviated significantly from background ratios measured in 
normal genomic DNA control hybridizations were interpreted as 
evidence of real DNA copy number alteration (see Estimating 
Significance of Altered Fluorescence Ratios in the supporting 
information). When indicated, DNA copy number profiles are 
displayed as a moving average (symmetric 5 -nearest neighbors). 
Map positions for arrayed human cDNAs were assigned by 
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identifying the starting position of the best and longest match of 
any DNA sequence represented in the corresponding UniGene 
cluster (10) against the "Golden Path" genome assembly 
(http://genome.ucsc.edu/; Oct 7, 2000 Freeze). For UniGene 
clusters represented by multiple arrayed elements, mean fluo- 
rescence ratios (for all elements representing the same UniGene 
cluster) are reported. For mRNA measurements, fluorescence 
ratios are "mean-centered" (i.e., reported relative to the mean 
ratio across the 44 tumor samples). The data set described here 
can be accessed in its entirety in the supporting information. 

Results 

We performed CGH on 44 predominantly locally advanced, 
primary breast tumors and 10 breast cancer cell lines, using 
cDNA microarrays containing 6,691 different mapped human 
genes (Fig. la; also see Materials and Methods for details of 
microarray hybridizations). To take full advantage of the im- 
proved spatial resolution of array CGH, we ordered (fluores- 
cence ratios for) the 6,691 cDNAs according to the "Golden 
Path" (http://gcnome.ucsc.edu/) genome assembly of the draft 
human genome sequences (11). In so doing, arrayed cDNAs not 
only themselves represent genes of potential interest (e.g., 
candidate oncogenes within amplicons), but also provide precise 
genetic landmarks for chromosomal regions of amplification and 



deletion. Parallel analysis of DNA from cell lines containing 
different numbers of X chromosomes (Fig. lb), as we did before 
(7) demonstrated the sensitivity of our method to detect single- 
copy loss (45, XO), and 13- (47,XXX), 2- (48.XXXX), or 
23-fold (49,XXXXX) gains (also see Fig. 5, which is published 
as supporting information on the PNAS web site). Fluorescence 
ratios were linearly proportional to copy number ratios, which 
were slightly underestimated, in agreement with previous ob- 
servations (7). Numerous DNA copy number alterations were 
evident in both the breast cancer cell lines and primary tumors 
(Fig. la), detected in the tumors despite the presence of euploid 
non-tumor cell types; the magnitudes of the observed changes 
were generally lower in the tumor samples. DNA copy-number 
alterations were found in every cancer cell line and tumor, and 
on every human chromosome in at least one sample. Recurrent 
regions of DNA copy number gain and loss were readily iden- 
tifiable. For example, gains within lq, 8q, 17q, and 20q were 
observed in a high proportion of breast cancer cell lines/tumors 
(90%/69%, 100%/47%, 100%/60%, and 90%/44%, respective- 
ly), as were losses within lp, 3p, 8p, and 13q (80%/24%, 
80%/22%, 80%/22%, and 70%/18%, respectively), consistent 
with published cytogenetic studies (refc. 2-4; a complete listing 
of gains/losses is provided in Tables 2 and 3, which are published 
as supporting information on the PNAS web site). The total 
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number of genomic alterations (gains and losses) was found to 
be significantly higher in breast tumors that were high grade (P - 
0.008), consistent with published CGH data (3), estrogen recep- 
tor negative (P » 0.04), and harboring TP53 mutations (P = 
0.0006) (sec Table 4, which is published as supporting informa- 
tion on the PNAS web site). 

The improved spatial resolution of our array CGH analysis is 
illustrated for chromosome S\ which displayed extensive DNA 
copy number alteration in our series. A detailed view of the 
variation in the copy number of 241 genes mapping to chromo- 
some 8 revealed multiple regions of recurrent amplification; 
each of these potentially harbors a different known or previously 
uncharacterized oncogene (Fig. 2a). The complexity of amplicon 
structure is most easily appreciated in the breast cancer cell line 
SKBR3. Although a conventional CGH analysis of 8q in SKBR3 
identified only two distinct regions of amplification (12), we 
observed three distinct regions of high-level amplification (la- 
beled 1-3 in Fig. 2b). For each of these regions we can define the 



boundaries of the interval recurrently amplified in the tumors we 
examined; in each case, known or plausible candidate oncogenes 
can be identified (a description of these regions, as well as the 
recurrently amplified regions on chromosomes 17 and 20, can be 
found in Figs. 6 and 7, which are published as supporting 
information on the PNAS web site). 

For a subset of breast cancer cell lines and tumors (4 and 37, 
respectively), and a subset of arrayed genes (6,095), mRNA 
levels were quantitatively measured in parallel by using cDNA 
microarrays (8). The parallel assessment of mRNA levels is 
useful in the interpretation of DNA copy number changes- For 
example, the highly amplified genes that are also highly ex- 
pressed are the strongest candidate oncogenes within an ampli- 
con. Perhaps more significantly, our parallel analysis of DNA 
copy number changes and mRNA levels provides us the oppor- 
tunity to assess the global impact of widespread DNA copy 
number alteration on gene expression in tumor cells. 

A strong influence of DNA copy number on gene expression 
is evident in an examination of the pseudocolor representations 
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of DNA copy number and mRNA levels for genes on chromo- 
some 17 (Fig. 3). The overall patterns of gene amplification and 
elevated gene expression are quite concordant; Le„ a significant 
fraction of highly amplified genes appear to be correspondingly 
highly expressed. The concordance between high-level amplifi- 
cation and increased gene expression is not restricted to chro- 
mosome 17. Genome-wide, of 117 high-level DNA amplifica- 
tions (fluorescence ratios >4, and representing 91 different 
genes), 62% (representing 54 different genes; see Table 5, which 
is published as supporting information on the PNAS web site) 
are found associated with at least moderately elevated mRNA 
levels (mean-centered fluorescence ratios >2), and 42% (rep- 
resenting 36 different genes) are found associated with compa- 
rably highly elevated mRNA levels (mean-centered fluorescence 
ratios >4). 

To determine the extent to which DNA deletion and lower- 
level amplification (in addition to high-level amplification) are 
also associated with corresponding alterations in mRNA levels, 
we performed three separate analyses on the complete data set 
(4 cell lines and 37 tumors, across 6,095 genes). First, we 
determined the average mRNA levels for each of five classes 
of genes, representing DNA deletion, no change, and low-, 
medium-, and high-level amplification (Fig. 4a). For both the 



breast cancer cell lines and tumors, average mRNA levels 
tracked with DNA copy number across all five classes, in a 
statistically significant fashion (lvalues for pair-wise Student's 
f tests comparing adjacent classes: cell lines, 4 X 10" 49 , 1 x l( r 4 , 
5 x 10- J , 1 x 10-*; tumors, 1 x 10~ 4 \ 1 x 1<T 2I \ 5 x lO" 41 , 
1 x 1(T 4 ). A linear regression of the average log(DNA copy 
number), for each class, against average log(mRNA level) 
demonstrated that on average, a 2-fold change in DNA copy 
number was accompanied by L4- and 1.5-fold changes in mRNA 
level for the breast cancer cell lines and tumors, respectivefy (Fig. 
4a, regression line not shown). Second, we characterized the 
distribution of the 6,095 correlations between DNA copy num- 
ber and mRNA level, each across the 37 tumor samples (Fig. 46). 
The distribution of correlations forms a normal-shaped curve, 
but with the peak markedly shifted in the positive direction from 
zero. This shift is statistically significant, as evidenced in a plot 
of observed vs. expected correlations (Fig. 4c), and reflects a 
pervasive global influence of DNA copy number alterations on 
gene expression. Notably, the highest correlations between DNA 
copy number and mRNA level (the right tail of the distribution 
in Fig. 4b) comprise both amplified and deleted genes (data not 
shown). Third, we used a linear regression model to estimate the 
fraction of all variation measured in mRNA levels among the 37 
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tumors that could be attributed to underlying variation in DNA 
copy number. From this analysis, we estimate that, overall, about 
7% of all of the observed variation in mRNA levels can be 
explained directly by variation in copy number of the altered 
genes (Fig. 4*/). We can reduce the effects of experimental 
measurement error on this estimate by using only that fraction 
of the data most reliably measured (fluorescence intensity/ 
background >3); using that data, our estimate of the percent 
variation in mRNA levels directly attributed to variation in gene 
copy number increases to 12% (Fig. This still undoubtedly 
represents a significant underestimate, as the observed variation 
in global gene expression is affected not only by true variation in 
the expression programs of the tumor cells themselves, but also 
by the variable presence of non-rumor cell types within clinical 
samples. 

Discussion 

This genome-wide, array CGH analysis of DNA copy number 
alteration in a series of human breast tumors demonstrates the 
usefulness of defining amplicon boundaries at high resolution 
(gene-by-gene), and quantitatively measuring amplicon shape, to 
assist in locating and identifying candidate oncogenes. By ana- 
lyzing mRNA levels in parallel, we have also discovered that 
changes in DNA copy number have a large, pervasive, direct 
effect on global gene expression patterns in both breast cancer 
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cell lines and tumors. Although the DNA microarrays used in our 
analysis may display a bias toward characterized and/or highly 
expressed genes, because we are examining such a large fraction 
of the genome (approximately 20% of all human genes), and 
because, as detailed above, we are likely underestimating the 
contribution of DNA copy number changes to altered gene 
expression, we believe our findings are likely to be generalizable 
(but would nevertheless still be remarkable if only applicable to 
this set of -6,100 genes). . 

In budding yeast, aneuploidy has been shown to result in 
chromosome-wide gene expression biases (13). Two recent 
studies have begun to examine the global relationship between 
DNA copy number and gene expression in cancer cells. In 
agreement with our findings, Phillips et al (14) have shown that 
with the acquisition of tumorigenicity in an immortalized pros- 
tate epithelial cell line, new chromosomal gains and losses 
resulted in a statistically significant respective increase and 
decrease in the average expression level of involved genes. In 
contrast, Platzer et al (15) recently reported that in metastatic 
colon tumors only -4% of genes within amplified regions were 
found more highly (>2-fold) expressed, when compared with 
normal colonic epithelium. This report differs substantially from 
our finding that 62% of highly amplified genes in breast cancer 
exhibit at least 2-fold increased expression. These contrasting 
findings may reflect methodological differences between the 
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studies. For example, the study of Platzer et aL (15) may have 
systematically under-measured gene expression changes. In this 
regard it is remarkable that only 1 4 transcripts of many thousand 
residing within unamplified chromosomal regions were found to 
exhibit at least 4-fold altered expression in metastatic colon 
cancer. Additionally, their reliance on lower-resolution chromo- 
somal CGH may have resulted in poorly delimiting the bound- 
aries of high-complexity amplicons, effectively overcalling re- 
gions with amplification. Alternatively, the contrasting findings 
for amplified genes may represent real biological differences 
between breast and metastatic colon tumors; resolution of this 
issue will require further studies. 

Our finding that widespread DNA copy number alteration has 
a large, pervasive and direct effect on global gene expression 
patterns in breast cancer has several important implications. 
First, this finding supports a high degree of copy number- 
dependent gene expression in tumors. Second, it suggests that 
most genes are not subject to specific autoregulation or dosage 
compensation. Third, this finding cautions that elevated expres- 
sion of an amplified gene cannot alone be considered strong 
independent evidence of a candidate oncogene's role in tumor- 
igenesis. In our study, fully 62% of highly amplified genes 
demonstrated moderately or highly elevated expression. This 
highlights the importance of high-resolution mapping of aropli- 
con boundaries and shape [to identify the "driving*' gene(s) 
within amplicons (16)], on a large number of samples, in addition 
to functional studies. Fourth, this finding suggests that analyzing 
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HER-2/neu Breast Cancer Predictive Testing 

Julie Sanford Hanna, Ph.D. and Dan Mornin. M.D. 



Each year, over 1 82,000 women in the United States are 
diagnosed with breast cancer, and approximately 45,000 die 
of the disease. 1 Incidence appears to be increasing in the 
United States at a rate of roughly 2% per year. The reasons 
for the increase are unclear, but non-genetic risk factors appear 
to play a large role. 2 

Five-year survival rates range from approximately 65%- 
85%, depending on demographic group, with a significant 
percentage of women experiencing recurrence of their cancer 
within 10 years of diagnosis. One of the factors most predic- 
tive for recurrence once a diagnosis of breast cancer has been 
made is the number of axillary lymph nodes to which tumor 
has metastasized. Most node-positive women are given adju- 
vant therapy, which increases their survival. However, 20%- 
30% of patients without axillary node involvement also 
develop recurrent disease, and the difficulty lies in how to iden- 
tify this high-risk subset of patients. These patients could 
benefit from increased surveillance, early intervention, and 
treatment. 

Prognostic markers currently used in breast cancer recur- 
rence prediction include tumor size, histological grade, steroid 
hormone receptor status, DNA ploidy, proliferative index, and 
cathepsin D status. Expression of growth factor receptors and 
over-expression of the HER-2/neu oncogene have also been 
identified as having value regarding treatment regimen and 
prognosis. 

HER-2/neu (also known as c-erbB2) is an oncogene that 
encodes a transmembrane glycoprotein that is homologous 
to, but distinct from, the epidermal growth factor receptor. 
Numerous studies have indicated that high levels of expres- 
sion of this protein are associated with rapid tumor growth, 
certain forms of therapy resistance, and shorter disease-free 
survival. The gene has been shown to be amplified and/or 
overexpressed in 10%-30% of invasive breast cancers and in 
40%-60% of intraductal breast carcinoma. 3 

There are two distinct FDA-approved methods by which 
HER-2/neu status can be evaluated: immunohistochemistry 
(1HC, HercepTest™) and FISH (fluorescent in situ hybridiza- 
tion, Path Vysion™ Kit). Both methods can be performed on 
archived and current specimens. The first method allows visual 
assessment of the amount of HER-2/neu protein present on 
the cell membrane. The latter method allows direct quantifi- 
cation of the level of gene amplification present in the tumor, 
enabling differentiation between low- versus high-amplifica- 
tidh. At least one study has demonstrated a difference in 



recurrence risk in women younger than 40 years of age for 
low- versus high-amplified tumors (54.5% compared to 
85.7%); this is compared to a recurrence rate of 16.7% for 
patients with no HER-2/neu gene amplification. 4 HER-2/neu 
status may be particularly important to establish in women with 
small (< 1 cm) tumor size. 

The choice of methodology for determination of HER-2/ 
neu status depends in part on the clinical setting. FDA approval 
for the Vysis FISH test was granted based on clinical trials 
involving 1549 node-positive patients. Patients received one 
of three different treatments consisting of different doses of 
cyclophosphamide, Adfiamycin, and 5-fluorouracil (CAF). 
The study showed that patients with amplified HER-2/neu 
benefited from treatment with higher doses of adriamycin- 
based therapy, while those with normal HER-2/neu levels did 
not. The study therefore identified a sub-set of women, who 
because they did not benefit from more aggressive treatment, 
did not need to be exposed to the associated side effects. In 
addition, other evidence indicates that HER-2/neu amplifica- 
tion in node-negative patients can be used as an independent 
prognostic indicator for early recurrence, recurrent disease at 
any time and disease-related death. 5 Demonstration of HER- 
2/neu gene amplification by FISH has also been shown to be 
of value in predicting response to chemotherapy in stage-2 
breast cancer pati ents . 

Selection of patients for Herceptin 0 (Trastuzumab) mono- 
clonal antibody therapy, however, is based upon demonstra- 
tion of HER-2/neu protein overexpression using HercepTest™. 
Studies using Herceptin c in patients with metastatic breast 
cancer show an increase in time to disease progression, 
increased response rate to chemotherapeutic agents and a small 
increase in overall survival rate. The FISH assays have not yet 
been approved fortius purpose, and studies looking at response 
to Herceptin 0 in patients with or without gene amplification 
status determined by FISH are in progress. 

In general, FISH and IHC results correlate well. However, 
subsets of tumors are found which show discordant results; 
i.e., protein overexpression without gene amplification or lack 
of protein overexpression with gene amplification. The clini- 
cal significance of such results is unclear. Based on the above 
considerations, HER-2/neu testing at SHMC/PAML will uti- 
lize immunohistochemistry (HercepTest 0 ) as a screen, fol- 
lowed by FISH in IHC-negative cases. Alternatively, either 
method may be ordered individually depending on the clini- 
cal setting or clinician preference. 
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CPT code information 

HER-2/neu via IHC 
88342 (including interpretive report) 

HER-2/neu via FISH 

88271 *2 Molecular cytogenetics, DNA probe, each 
88274 Molecular cytogenetics, interphase in situ hybrid- 
ization, analyze 25-99 cells 
88291 Cytogenetics and molecular cytogenetics, interpre- 
tation and report 

Procedural Information 

Immunohistochemistry is performed using the FDA-approved 
DAKO antibody kit, Herceptest©. The DAKO kit contains 
reagents required to complete a two-step immunohisto- 
chemical staining procedure for.routinery processed, paraffin- 
embedded specimens. Following incubation with the primary 
rabbit antibody to human HER-2/neu protein, the kit employs 
a ready-to-use dextran-based visualization reagent. This re- 
agent consists of both secondary goat anti-rabbit antibody 
molecules with horseradish peroxidase molecules linked to a 
common dextran polymer backbone, thus eliminating the need 
for sequential application of link antibody and peroxidase 
conjugated antibody. Enzymatic conversion ef the subse- 
quently added chromogen results in formation of visible 
reaction product at the antigen site. The specimen is then coun- 
terstained; a pathologist using light-microscopy interprets 
results. 

FISH analysis at SHMC/PAML is performed using the 
FDA-approved Path Vysion™ HER-2/neu DNA probe kit, pro- 
duced by Vysis, Inc. Formalin fixed, paraffin-embedded breast 
tissue is processed using routine histological methods, and then 
slides are treated to allow hybridization of DNA probes to the 
nuclei present in the tissue section. The Pathvysion™ kit con- 
tains two direct-labeled DNA probes, one specific for the 
alphoid repetitive DNA (CEP 17, spectrum orange) present at 
the chromosome 17 centromere and the second for the HER- 
2/neu oncogene located at 1 7q 1 1 .2- 1 2 (spectrum green). Enu- 
meration of the probes allows a ratio of the number of copies 
of chromosome 17 to the number of copies of HER-2/neu to 
be obtained; this enables quantification of low versus high 
amplification levels, and allows an estimate of the percentage 
of cells with HER-2/neu gene amplification. The clinically 
relevant distinction is whether the gene amplification is due 
to increased gene copy number on the two chromosome 17 
homologues normally present or an increase in the number of 
chromosome 17s in the cells. In the majority of cases, ratio 
equivalents less than 2.0 are indicative of a normal/negative 
result, ratios of 2.1 and over indicate that amplification is 
present and to what degree. Interpretation of this data will be 
performed and reported from the Vysis-certified Cytogenet- 
ics laboratory at SHMC. 
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DECLARATION OF AUDREY D. GODDARD, Ph.D UNDER 37 CF.R. $ 1.132 

Assistant Commissioner of Patents 
Washington, D.C. 20231 

Sir: 

1, Audrey D. Goddard, PhD. do hereby declare and say as follows: 

1 . I am a Senior Clinical Scientist at the Experimental Medicine/BioOncology, Medical 
Affairs Department of Genentech, Inc., South San Francisco, California 94080. 

2. Between 1 993 and 200 1 , I headed the DNA Sequencing Laboratory at the Molecular 
Biology Department of Genentech, Inc. During this time, my responsibilities included the 
identification and characterization of genes contributing to the oncogenic process, and determination 
of the chromosomal localization of novel genes. 

3 . My scientific Curriculum Vitae, including my list of publications, is attached to and 
forms part of this Declaration (Exhibit A). 
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Serial No.: * 
Filed:* 

4. I am familiar with a variety of techniques known in the art for detecting and 
quantifying the amplification of oncogenes in cancer, including the quantitative TaqMan PCR (i.e., 
"gene amplification") assay described in the above captioned patent application. 

5. The TaqMan PCR assay is described, for example, in the following scientific 
publications: Higuchi et al, Biotechnology 10:413-417 (1992) (Exhibit B); Livak et aL, PCR 
Methods AppL 4:357-362 (1995) (Exhibit C) and Heid et ai, Genome Res. 6:986-994 (1996) 
(Exhibit D). Briefly, the assay is based on the principle that successful PCR yields a fluorescent 
signal due to Taq DNA polymerase-mediated exonuclease digestion of a fluorescently labeled 
oligonucleotide that is homologous to a sequence between two PCR primers. The extent of 
digestion depends directly on the amount of PCR, and can be quantified accurately by measuring the 
increment in fluorescence that results from decreased energy transfer. This is an extremely sensitive 
technique, which allows detection in the exponential phase of the PCR reaction and, as a result, 
leads to accurate determination of gene copy number. 

6. The quantitative fluorescent TaqMan PCR assay has been extensively and 
successfully used to characterize genes involved in cancer development and progression. 
Amplification of protooncogenes has been studied in a variety of human tumors, and is widely 
considered as having etiological, diagnostic and prognostic significance. This use of the quantitative 
TaqMan PCR assay is exemplified by the following scientific publications: Pennica et ai, Proc T 
Natl. Acad. Sci. USA 95(25):14717-14722 (1998) (Exhibit E); Pitti et al y Nature 
396(6712);699-703 (1998) (Exhibit F) and Bieche et ai, Int. J. Cancer 78:661-666 (1998) (Exhibit 
G), the first two of which I am co-author. In particular, Pennica et ai have used the quantitative 
TaqMan PCR assay to study relative gene amplification of WISP and c-myc in various cell lines, 
colorectal tumors and normal mucosa. Pitti et ai studied the genomic amplification of a decoy 
receptor for Fas ligand in lung and colon cancer, using the quantitative TaqMan PCR assay. Bieche 
et ai used the assay to study gene amplification in breast cancer. 
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7. It is my personal experience that the quantitative TaqMan PCR technique is 
technically sensitive enough to detect at least a 2-fold increase in gene copy number relative to 
control. It is further my considered scientific opinion that an at least 2-fold increase in gene copy 
number in a tumor tissue sample relative to a normal (i.e., non-tumor) sample is significant and 
useful in that the detected increase in gene copy number in the tumor sample relative to the normal 
sample serves as a basis for using relative gene copy number as quantitated by the TaqMan PCR 
technique as a diagnostic marker for the presence or absence of tumor in a tissue sample of unknown - 
pathology. Accordingly, a gene identified as being amplified at least 2-fold by the quantitative 
TaqMan PCR assay in a tumor sample relative to a normal sample is useful as a. marker for the 
diagnosis of cancer, for monitoring cancer development and/or for measuring the efficacy of cancer 
therapy. 

8. I" declare further that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true. I declare that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code, and that such willful false statements may jeopardize the validity of the application or any 
patent issuing thereon. 
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110 Congo St. 

San Francisco, CA, 94131 

415.841.9154 

415.819.2247 (mobile) 
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PROFESSIONAL EXPERIENCE 

Gerientech, Inc. 1993-present 
South San Francisco, CA 

2001 - present Senior Clinical Scientist 

Experimental Medicine / BioOncology, Medical Affairs 

Responsibilities: 

• Companion diagnostic oncology products 

• Acquisition of clinical samples from Genentech's clinical trials for translational research 

• Translational research using clinical specimen and data for drug development and 
diagnostics 

• Member of Development Science Review Committee, Diagnostic Oversight Team, 21 CFR 
Part 1 1 Subteam 

Interests: 

• Ethical and legal implications of experiments with clinical specimens and data 

• Application of pharmacogenomics in clinical trials 



1998 - 2001 Senior Scientist 

Head of the DNA Sequencing Laboratory, Molecular Biology Department, Research 
Responsibilities: 

• Management of a laboratory of up to nineteen -including postdoctoral fellow, associate 
scientist, senior research associate and research assistants/associaie levels 

• Management of a $750K budget 

• DNA sequencing core facility supporting a 350+ person research facility. 

• DNA sequencing for high throughput gene discovery, - ESTs, cDNAs, and constructs 

• Genomic sequence analysis and gene identification 

• DNA sequence and primary protein analysis 

Research: 

• Chromosomal localization of novel genes 

• Identification and characterization of genes contributing to the oncogenic process 

• Identification and characterization of genes contributing to inflammatory diseases 

• Design and development of schemes for high throughput genomic DNA sequence analysis 

• Candidate gene prediction and evaluation 



Genentech, Inc. 
1 DNA Way 

South San Francisco, CA, 94080 

650.225.6429 

goddarda@gene.com 
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1993 -1998 



Scientist 



Head of the DNA Sequencing Laboratory, Molecular Biology Department, Research 
Responsibilities 

• DNA sequencing core facility supporting a 350+ person research facility 

• Assumed responsibility for a pre-existing team of five technicians and expanded the group 
into fifteen, introducing a level of middle management and additional areas of research 

• Participated in the development of the basic plan for high throughput secreted protein 
discovery program - sequencing strategies, data analysis and tracking, database design 

• High throughput EST and cDNA sequencing for new gene identification. 

• Design and implementation of analysis tools required for high throughput gene identification. 

• Chromosomal localization of genes encoding novel secreted proteins. 

Research: 

• Genomic sequence scanning for new gene discovery. 

• Development of signal peptide selection methods. 

• Evaluation of candidate disease genes. 

• Growth hormone receptor gene SNPs in children with Idiopathic short stature 

Imperial Cancer Research Fund 1989-1992 
London, UK with Dr. Ellen Solomon 

6/89-12/92 Postdoctoral Fellow 

• Cloning and characterization of the genes fused at the acute promyelocytic leukemia 
translocation breakpoints on chromosomes 17 and 15. 

• Prepared a successfully funded European Union multi-center grant application 



McMaster University 

Hamilton, Ontario, Canada with Dr. G. D. Sweeney 
5/83 - 8/83: NSERC Summer Student 

• In vitro metabolism of p-naphthoflavone in C57BI/6J and DBA mice 



EDUCATION 



Ph.D. 



University of Toronto 
Toronto, Ontario, Canada. 
Department of Medical 
Biophysics. 



"Phenotypic and genotypic effects of mutations in 
the human retinoblastoma gene." 
Supervisor: Dr. R. A. Phillips 



1989 



Honours B.Sc 

"The in vitro metabolism of the cytochrome P-448 
inducer p-naphthoflavone in C57BU6J mice." 
Supervisor: Dr. G. D. Sweeney 



McMaster University, 
Hamilton, Ontario, Canada. 
Department of Biochemistry 



1983 
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ACADEMIC AWARDS 



Imperial Cancer Research Fund Postdoctoral Fellowship 

Medical Research Council Studentship 

NSERC Undergraduate Summer Research Award 

Society of Chemical Industry Merit Award (Hons. Biochem.) 

Dr. Harry Lyman Hooker Scholarship 

J.L.W. Gill Scholarship 

Business and Professional Women's Club Scholarship 
Wyerhauser Foundation Scholarship 



1989-1992 
1983-1988 
1983 



1983 



1981-1983 
1981-1982 
1980-1981 
1979-1980 



INVITED PRESENTATIONS 



j 



Genentech's gene discovery pipeline: High throughput identification, cloning and 
characterization of novel genes. Functional Genomics: From Genome to Function, Litchfield 
Park, AZ, USA. October 2000 

High throughput identification, cloning and characterization of novel genes. G2K:Back to 
Science, Advances in Genome Biology and Technology I. Marco Island, FL, USA. February 



Quality control in DNA Sequencing: The use of Phred and Phrap. Bay Area Sequencing 
Users Meeting, Berkeley, CA, USA. April 1999 

High throughput secreted protein identification and cloning. Tenth International Genome 
Sequencing and Analysis Conference, Miami, FL, USA. September 1998 

The evolution of DNA sequencing: The Genentech perspective. Bay Area Sequencing Users 
Meeting, Berkeley, CA, USA. May 1998 

Partial Growth Hormone Insensitivity: The role of GH-receptor mutations in Idiopathic Short 
Stature. Tenth Annual National Cooperative Growth Study Investigators Meeting, San 
Francisco, CA, USA. October, 1996 

Growth hormone (GH) receptor defects are present in selected children with non-GH-deficient 
short stature: A molecular basis for partial GH-insensitivity. 76 th Annual Meeting of The 
Endocrine Society, Anaheim, CA, USA. June 1994 

A previously uncharacterized gene, myl, is fused to the retinoic acid receptor alpha gene in 
acute promyelocytic leukemia. XV International Association for Comparative Research on 
Leukemia and Related Disease, Padua, Italy. October 1991 



2000 



• 
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PATENTS 



Goddard A, Godowski PJ, Gurney AL. NL2 Tie ligand homologue polypeptide. Patent 
Number: 6,455,496. Date of Patent: Sept. 24, 2002. 

Goddard A, Godowski PJ and Gurney AL NL3 Tie ligand homologue nucleic acids. Patent 
Number: 6,426,218. Date of Patent: July 30, 2002. 

Godowski P, Gurney A, Hillan KJ, Botstein D, Goddard A, Roy M, Ferrara N, Tumas D, 
Schwall R. NL4 Tie ligand homologue nucleic acid. Patent Number: 6,4137,770. Date of 
Patent: July 2, 2002. 

Ashkenazi A, Fong S, Goddard A, Gurney AL, Napier MA, Tumas D, Wood Wl. Nucleic acid 
encoding A-33 related antigen poly peptides. Patent Number: 6,410,708. Date of Patent:: 
Jun. 25, 2002. 

Botstein DA, Cohen RL, Goddard AD, Gurney AL, Hillan KJ. Lawrence DA, Levine AJ, 
Pennica D, Roy MA and Wood Wl. WISP polypeptides and nucleic acids encoding same. 
Patent Number: 6,387,657. Date of Patent: May 14, 2002. 

Goddard A, Godowski PJ and Gurney AL. Tie ligands. Patent Number: 6,372,491. Date of 
Patent: April 16, 2002. 

Godowski PJ, Gurney AL. Goddard A and Hillan K. TIE ligand homologue antibody. Patent 
Number: 6,350,450. Date of Patent: Feb. 26, 2002. 

Fong S. Ferrara N. Goddard A, Godowski PJ, Gurney AL, Hillan K and Williams PM. Tie 
receptor tyrosine kinase ligand homologues. Patent Number: 6,348,351. Date of Patent: 
Feb. 19, 2002. 

Goddard A, Godowski PJ and Gurney AL. Ligand homologues. Patent Number: 6,348,350. 
Date of Patent: Feb. 19, 2002. 

Attie KM, Carlsson LMS, Gesundheit N and Goddard A. Treatment of partial growth 
hormone insensitivity syndrome. Patent Number: 6,207.640. Date of Patent: March 27. 
2001. 

Fong S, Ferrara N, Goddard A, Godowski PJ, Gurney AL, Hillan K and Williams PM. Nucleic 
acids encoding NL-3. Patent Number: 6,074,873. Date of Patent: June 13, 2000 

Attie K, Carlsson LMS, Gesunheit N and Goddard A. Treatment of partial growth hormone 
insensitivity syndrome. Patent Number: 5.824,642. Date of Patent: October 20, 1998 

Attie K. Carlsson LMS, Gesunheit N and Goddard A. Treatment of partial growth hormone 
insensitivity syndrome. Patent Number: 5.646,113. Date of Patent: July 8, 1997 



Multiple additional provisional applications filed 
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PUBLICATIONS 

Seshasayee D, Dowd P, Gu Q, Erickson S, Goddard AD Comparative sequence analysis of 
the HER2 locus in mouse and man. Manuscript in preparation. 

Abuzzahab MJ, Goddard A, Grigorescu F, Lautier C, Smith RJ and Chernausek SD. Human 
IGF-1 receptor mutations resulting in pre- and post-natal growth retardation. Manuscript in 
preparation. 

Aggarwal S, Xie, M-H t Foster J, Frantz G, Stinson J, Corpuz RT, Simmons L, Hillan K, 
Yansura DG, Vandlen RL, Goddard AD and Gurney AL. FHFR, a novel receptor for the 
fibroblast growth factors. Manuscript submitted. 

Adams SH, Chui C, Schilbach SL, Yu XX, Goddard AD, Grimaldi JC, Lee J, Dowd P, Colman 
S., Lewin DA. (2001) BFIT, a unique acyl-CoA thioesterase induced in thermogenic brown 
adipose tissue: Cloning, organization of the human gene, and assessment of a potential link 
to obesity. Biochemical Journal 360: 135-142. 

Lee J. Ho WH. Maruoka M. Corpuz RT. Baldwin DT. Foster JS. Goddard AD. Yansura DG. 
Vandlen RL. Wood Wl. Gurney AL. (2001) IL-17E, a novel proinflammatory ligand for the IL- 
17 receptor homolog IL-17RM . Journal of Biological Chemistry 276(2): 1660-1664. 

Xie M-H t Aggarwal S, Ho W-H, Foster J, Zhang Z, Stinson J, Wood Wl, Goddard AD and 
Gurney AL. (2000) Interleukin (lL)-22, a novel human cytokine that signals through the 
interferon-receptor related proteins CRF2-4 and IL-22R. Journal of Biological Chemistry 275: 
31335-31339. 

Weiss GA, Watanabe CK, Zhong A, Goddard A and Sidhu SS. (2000) Rapid mapping of 
protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. USA 97: 
8950-8954. 

Guo S, Yamaguchi Y, Schilbach S, Wada T.;Lee J, Goddard A, French D , Handa H, 
Rosenthal A. (2000) A regulator of transcriptional elongation controls vertebrate neuronal 
development. Nature 408: 366-369. 

Yan M, Wang L-C, Hymowitz SG, Schilbach S, Lee J, Goddard A, de Vos AM, Gao WQ, Dixit 
VM. (2000) Two-amino acid molecular switch in an epithelial morphogen that regulates 
binding to two distinct receptors. Science 290: 523-527. 

Sehl PD, Tai JTN, Hillan KJ, Brown LA, Goddard A, Yang R, Jin H and Lowe DG. (2000) 
Application of cDNA microarrays in determining molecular phenotype in cardiac growth, 
development, and response to injury. Circulation 101: 1990-1999. 

Guo S, Brush J, Teraoka H, Goddard A, Wilson SW, Mullins MC and Rosenthal A. (1999) 
Development of noradrenergic neurons in the zebrafish hindbrain requires BMP, FGF8, and 
the homeodomain protein soulless/Phox2A. Neuron 24: 555-566. 

Stone D, Murone, M, Luoh, S, Ye W, Armanini P, Gurney A, Phillips HS, Brush, J, Goddard 
A, de Sauvage FJ and Rosenthal A. (1999) Characterization of the human suppressor of 
fused; a negative regulator of the zinc-finger transcription factor Gli. J. Cell Sci. 112: 4437- 
4448. 

Xie M-H, Holcomb I, Deuel B, Dowd P, Huang A, Vagts A, Foster J, Liang J, Brush J, Gu Q, 
Hillan K, Goddard A and Gurney, A.L. (1999) FGF-19, a novel fibroblast growth factor with 
unique specificity for FGFR4. Cytokine 11: 729-735. 
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Yan M, Lee J, Schilbach S, Goddard A and Dixit V. (1999) mE10, a novel caspase 
recruitment domain-containing proapoptotic molecule. J. Biol. Chem. 274(15): 10287-10292. 

Gurney AL, Marsters SA, Huang RM, Pitti RM, Mark DT, Baldwin DT, Gray AM, Dowd P, 
Brush J, Heldens S, Schow P, Goddard AD, Wood.WI, Baker KP, Godowski PJ and 
Ashkenazi A. (1999) Identification of a new member of the tumor necrosis factor family and its 
receptor, a human ortholog of mouse GITR. Current Biology 9(4): 215-218. 

Ridgway JBB, Ng E, Kern JA ,Lee J, Brush J, Goddard A and Carter P. (1999) Identification 
of a human anti-CD55 single-chain Fv by subtractive panning of a phage library using tumor 
and nontumor cell lines. Cancer Research 59: 2718-2723. 

Pitti RM, Marsters SA, Lawrence DA, Roy M, Kischkel FC, Dowd P, Huang A, Donahue CJ f 
Sherwood SW, Baldwin DT, Godowski PJ, Wood Wl, Gurney AL, Hillan KJ, Cohen RL, 
Goddard AD, Botstein D and Ashkenazi A. (1998) Genomic amplification of a decoy receptor 
for Fas ligand in lung and colon cancer. Nature 396(6712): 699-703. 

Pennica D, Swanson TA, Welsh JW, Roy MA, Lawrence DA, Lee J, Brush J, Taneyhill LA, 
Deuel B, Lew M, Watanabe C, Cohen RL, Melhem MF, Finley GG, Quirke P, Goddard AD, 
Hillan KJ, Gurney AL, Botstein D and Levine AJ. (1998) WISP genes are members of the 
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We have enhanced the polymerase chain 
reaction (PGR) such that specific DNA 
sequences can be detected without open- 
ing the reaction tube* This enhancement 
requires the addition of ethidium bromide 
(EtBr) to a PGR. Since the fluorescence of 
EtBr increases in die presence of double* 
stranded (ds) DNA an increase in fluores- 
cence in such a PGR indicates a positive 
amplification, which can be easily moni- 
tored externally. In feet, amplification can 
be continuously monitored in order to 
follow its progress. Hie ability to simulta- 
neously amplify specific DNA sequences 
and detect the product of the amplification 
both simplifies and improves PGR and 
may facilitate its automation and more 
widespread use in the clinic or in other 
situations requiring high sample through- 
put 

Although the potential benefits of PCR 1 to clin- 
ical diagnostics arc well known 2 - 5 , it is still not 
widely used it* this setting, even though ** w 
four year* fiinco thermostat* DNA poty" 1 * 1 "* 

ases 4 made PCR practical. Some of the reasons for it* slow. 
Hcceptance arc high cost, tack, of automation of pre-r and 
prot-PCR processing steps, and false positive results, from 
carryovCT-contamination. The first two points arc related 
in that labor is the largest contributor to cost ait the present 
stage of PCR development. Most current assays require 
some form of "downstream" processing once thermocy* 
ding is done in order id determine whether the target 
DNA sequence was present and has amplified. These 
include DNA hybrkh'wrion** gel eiecrrop^esis or 
without use of restriction digestion*'*, HPLCr, or capillary 
electrophoresis 10 . These methods arc labor-intense, have 
low throughput, and arc difBcuh to automate. The third 
point is also closely related to downstream processing. 
The handling of the PCR product in these downstream 
processes increases the chances that amplified DNA will 
spread through the typing lab, resulting in a risk of 



carryover" false positives in subsequent testing". 

These downstream processing steps would be elimi- 
nated if specific amplification and detection of amplified 
DNA took place simultaneously within an unopened re- 
action vessel Assays m which such different processes take 
place without the need to separate reaction components 
have been termed \1jomogeiteous*\ No truly homoge- 
neous PCR assay has been demonstrated to date, although 
progress towards this end has been reported. Chehab, et 
aL™ developed a PCR product detection scheme using 
fluorescent primers that resulted in a fluorescent PCR 
product A lick-specific primers, each with different fluo- 
rescent tags, were used to indicate the genotype of the 
DNA. However, the unincorporated primers must still be 
removed in a downstream process in order to visualize the 
result Recently, Holland, et al> u \ developed an assay in 
which the endogenous 5 r exbnudease assay of Toq DNA 
polymerase was exploited to cleave a labeled oligonucleo- 
tide probe. The probe would only dcave if PCR arnpSft* 
cation had produced its complementary sequence. In 
order to detect the dcavage products, however, a subse- 
quent process w again needed. . 

We have developed a truly homogeneous assay for PCR 
and PCR product detection based upon tbc gready in- 
creased fluorescence that ethidium bromide and other 
DNA binding dyes exhibit when they are bound to.ds- 
DNA l4-ie . As outtinecl in Figure f, a prototypic PCR 



I VCRcvdeO 1 



fnx F.tBr 



/ 



HDNA prinxo 



«DMA primer? 




ilnJy tfJONA 
rt^pt sfcquaicc 



(up W prvai! Hg) 

nCVU 1 -Principle of simultaneous amplification and detection of 
PCR product: The component* of a PCR containing EtBr that air: 
fluorescent »re hsted— EtBr itself, EtBr bound to other ssDHA m 
daDN A. There is » lar^c mwrescencc enhancement when EtBr Is 
bound to DNA and binding; « greatly enhanced when DNA is 
doubk-stTandcd* After sutndem (n) cydes of PGR, the net 
increase in dspNA resists in additional EtBr binding, and a net 
increase in total fluorcsccncc- 
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RGWffi 2 Ge! electrophoresis of PCR amplification products of the 
human , mtdcar gene, HLA DQfc, made in the presence of 
incrrcmag amounts of EtBr (up 10 8 Hg/tnl). The presence of 
EtBr iws no obvious effect on the yield or specificity of amplifi- 



A, 




B. 




$ (A) Fluorescence measurement* from PCRs that contain 
0.5 pgftnl EtBr and that arc specific for Y-chrotnosoxjac repeat 
$eooence*. Five replicate PCRs *ere begun containing each of the 
DNA* specified. At each indkaxed cyde, one of the five replicate 
PCRs for each DNA -was removed from thcrmocyding and its 
fluorescence measured, Unit* of fluoresce nee art arbitrary. (B) 
UV photography of PGR tube* (0.5 ml Eppcndorf*tylc, polyprC" 
pykne mtcw-cemrifugc tubes) cont»mmg reactions, those scare* 
tng from 2 ng male DNA and control reactions without any DKA t 
from (Ay 
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begins with primers that are tingle-stranded DNA (ss- 
DNA)> dNTPs, and DNA polymerase! An amount of 
dsDNA containing the target sequence (target DNfA) is 
also typically present. This amount can vary, depending 
on the application, from single-cell amount* of DNA 17 to 
micrograms per PCR- 8 . If EtBr is present, the reagents 
that will fluoresce, in order of iiKrcasing fluorescence, are 
free EtBr itself, and EtBr bound to the singk-sLrandcd 
DNA primer* and to the doublc-^tranded target DNA (by 
its intercalation between the stacked bases of the DNA 
dooblc-hcfix). After the first denatu ration cyde* target 
DNA will be largely single-stranded. After a PCS is 
completed, the most significant change i$ the increase in 
the amount of dsDN A (the PGR product itself) of up to 
several micrograms. Formerly free EtBr is bound to the 
additional dsDNA* resulting in an increase in fluores- 
cence. There is also some decrease in the amount of 
ssDNA primer, but beau.se the binding of EtBr to s*DN A 
is much less than to dsDNA, the effect of this change on 
the total fluorescence of the sample is small. The fluores- 
cence increase can be measured by directing excitation 
illumination through the walls of the amplification vessel 
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before and after, or even continuously during, thermocy. 
ding. 

RESULTS 

PCR in the presence of EtBr. In order to assess the 
affect of EtBr in TOR, amplifications of the human HlA 
DQa gfcne 1 * were performed with the dye present at 
concentrations from 0,06 to 8.0 u-Qfrnl (a tynteaj concen- 
tration of EtBr used tn staining of nucleic acids following 
gel electrophoresis is 0.5 u.g/mF). As shown in Figure 2, gel 
eleetrorjhorcsis revealed little or no difference in the yield 
or quality of the amplification product whether EtBr was 
absent or present at any of these concentrations, indicat- 
ing that EtBr does not inhibit PCR, 

Deteetiou of human Y-chromosonra specific 
unences* Sequence-specific, fluorescence enhancement of 
EtBr as a result of PCR was demonstrated in a series of 
amplifications containing 0.5 jxg/ml EtBr and primers 
specific to repeat DNA sequences found on the human 
Y-chromosomc^. These PCRs initially contained cither 
60 ng male, 60 ng female, 2 ng mak human or no DNA. 
Five replicate PCRs were begun for each DNA* After 0, 
17, 21, 24 and 29 cycles of therniocyding, a PCR for each 
DNA was removed from the thermoeyden and it* fluo- 
rescence measured in a spcctroSnorometex and plotted 
vs. amplification cyde number (Fig. 3A). The shape of this 
curve reSects the fact that by the time an increase in 
fluorescence can be detected, the increase in DNA is 
becoming linear and not exponential with cyde number: 
As shown, the fluorescence increased about three-fold 
over the background fluorescence for the PCRs contain- 
ing human male DNA, but did not significantly increase 
for negative control PCRs, which contained cither no 
DNA or human female DNA. The more male DNA 
present to begin with— 60 ng versus 2 ng— the fewer 
cycles were needed to give a detectable increase in fluo- 
rescence. Gel eJeetrophWesis 00 the product* of these 
amplifications showed that DNA fragments of the ex- 
pected size were made in the male. DNA containing 
reactions and that Utile DNA syntbe$5s took place in the 
control samples, 

in addition, the increase in. fluorescence was visualized 
by simply laying the completed* unopened PCRs on a UV 
trarisilhirninatOT and photographing them through a red 
filter. This is shown in figure SB tor the reactions that 
began with 2 ng male DNA and those with no DNA. 

Detection of specific allele* of the human fS-glohus 
gene. In order to demonstrate that this approach has 
adequate specificity to allow genetic screening* a detection 
of the $jckfe-cdl anemia mutation was performed. Figure 
4 shows the fluorescence from completed amplifications 

containing EtBr (0.5 i*g/ml) a* detected by photography 

of the reaction tubes on a UV rransffluminator. These 
reactions were performed using primers specific for ci- 
ther the. wild-type or sickle-cell mutation of the human 
^lobih gene 8 . The $pcdfldty for each allele is imparted 
by placing the sicUe-mutation site at the terminal 3' 
nucleotide of one primer. By using an appropriate primer 
annealing temperature, primer extension — and thus an> 
plir*OTt5pi* — can take place only if the 3' nucleotide of the 
primtr t$ cornnlemcntary to the p-gJobin aUdc present* • 
Each jpair 61 amplifications shown in Figure 4 consists of 
a reaction with either the wild-type allele specific (left 
tube) or skkic-alleie specific (right tube) primers. Three 
different DN As were typed: DNA from a homozygous, 
wild-type p-globin individual (A A); from a heterozygous 
sickle p-glpbin individual (AS); and from a homozygous 
sickle p-giobm individual (SS). Each DNA (50 ng genomic 
DNA to start each PGR) was analyzed m triplicate (3 pairs 
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0 f reactions each). The DMA .type was reflected in the ' 
Illative fluorescence Intensities in each pair of completed 
flrn pli£cHtioii6* There was a significant increase in fluorea- 
pence only where a f^globin allele DNA matched the 
prinx* Whco measured ou. a spcarofloororaetcr 
Mata not shown)* this fluorescence was about three times 
[j^t present in a PCR where both p-globxn alkies were 
mbitjatchcd to the primer set. Gel ctcctrophofestt (not 
shown) established mat this increase in fluorescence was 
joe to the synthesis of nearly a microgram of a DNA 
fragment of the expected size for P-^lobin. There was 
iitdc synthesis of dsDNA in reactions in . which the ailele- 
^pedfic primer was mismatched to both alleles* 

Conrimiou* mmtiorxag of a PGR* Using a fiber optic 
devkerit i* possible to direct excitation illumination from 
j, S pectrofluorometer to a PCR undergoing thcrmocyding 
and to rearm its fluorescence to the Rpcctrofttjorometer. 
The fluorescence readout of such an arrangement, di- 
ttoed at an EtBr-concaining amplification of Y^chromo- 
fame speci6c sequences from 25 ng of human male DNA* 
is shown in Figure 5. The readout from a control fCR 
w Hh no target DNA is also shown. Thirty cycles of PCR 
w erc monitored for each. 

The fhiorcsccncc trace as a function of time dearly 
shows the effect of the therm ocyding. Fluorescence inten- 
sity rises and. Calls inversely with temperature The fluo- 
rescence intensity is minimum at the denaturation tem- 
perature (°/4°C) and maximum at the annealingjextension 
temperature (50°C). In the negative-control FCR, these 
fluorescence maxima and minima do not change signifi- 
cantly over the thirty tbcrraocycks, indicating that there is 
Kttie dsDNA synthesis without the appropriate target 
DNA, and there is little if any bleaching of EtBr during 
the continuous illumination of the sample. 

Jn the PCR containing male DNA, the fluorescence 
maxima at the annealing/extension temperature begin to 
increase at about 4000 seconds of therroocycling, and 
continue to increase with time, indicating that dsDNA is 
being produced at a detectable level. Note that the fluo- 
rescence minima at the denatmatbo temperature do not 
significantly increase, presumably because al thh temper- 
ature there is no d&DNA for EtBr to bind. Unw the course 
of the amplification is followed by tracking the fluorcs*. 
cence increase at the aancaKng temperature. Analysis of 
the products of these two amplifications by gel clcetropho- 
rais showed a DNA fragment of the expected size for the 
male DNA containing sample and no detectable DNA 
synthesis for the control sample^ 

DISCUSSION 

Downstream processes such a* hybridization to a se- 
quence-specific probe can enhance the specificity of DNA 
deteiAivu \jy PCR. The cHriaioatiori o*" thcac processes 
means that' the specificity of this homogeneous assay 
depends solely on that of PCR* In the case of «ikkle-celi 
disease, we have shown that PGR alone has sufficient DNA 
sequence specificity to permit genetic screening. Using 
appropriate amplification conditions, there is little non* 
specific production of ckDNA in the abeeace of the 
appropriate target allele. 

The specificity required to detect pathogens can be 
more or less than that required' to do gerietic scrccn'mg, 
depending on the number of pathogens in the sample and 
the amount of other DNA that must be taken with the 
sample. A difficult target is HIV, which TcquiTcs detection 
of a viraJ gexjome that can be at the level of a few copies 
per thousands of host cells 6 . Compared with genetic 
screening, which is performed on cells containing at least 
one copy of the target sequence* HIV idetection requires 
both more specificity and the input of mote total 



1246638 



P:4 



Homozygous 

AA 



Heterozygous 

AS 



Homozygous 

ss 



E4 UV photography of PCR tubes containing Justifications 
using EtBr that art specific to wM-typc (A) or lidtfe (5) alleles of 
the human 0-gtobin gene. The left of each pair of tubes contains 
aBelt'Spedfie primers to the wild-type afleks. the right tube 
primers to the sicWe attek- The photograph was taten after SO 
cycles of PCR, and the input DNAs and the alleles ihey contain 
arc indicated. Fifty tog of DNA was used to bcem PGR. Typing 
was done in triplicate (3 pairs of PCRs) for each input DNA 
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ROTHES Continuous, real-time monitoring of a PCR. A fiber optic 
was wd to carry, excitation light to a KJR in progress and also 
emitted light back to a flooromctcr (tec Exoentncntal Protocol). 
AmplificaBon U-MOg human malo-DNA specific primers in a PCR 
starting with 20 ng of human male DNA (too), or in a control 
PCR without DNA (bottom), were, monitored. Thirty cydes of 
PCR were followed for each. The temperature cycled between 
94*C (denaturation) and 50*C (annealing and extension). Note in 
the male DNA PCR,. the cyde (rime) dependent increase in 
fluorescence at the anneafingf'exteDsion temperature. 
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DNA — up to micrograra amouno™ia order to have suf- 
ficient numbers of target sequences. This large amount of 
starting DNA m an amplification sijmirjcanUy increases 
the teckground fluorescence over which aoy additional 
fluorescence produced hy PCR must he detected. An 
additional complication that occurs with targets m low 
copy-number is the formation of the 4 *primer-dimer" 
artifact. This is the result of the extension of one primer 
using the other primer » a template Although this occurs 
infrequently, once it occurs the extension product is a 
substrate for PCR amplification, and can compete with 
true PCR targets if those targets are rare. The primer- 
dimer product is of course dsDNA and thus is a potential 
source of false signal in this homogeneous a*say. 

To increase PCR specificity and reduce the effect of 
primer-dixncT arriplirkarion, we are investigating a num- 
ber of approaches, including the use of nested-primer 
amplifications that take place in a single tube 8 , and the 
•"hot-start", in which nonspecific ampfcfkation is reduced 
by raising the temperature of the reaction before DNA 
synthesis begins 85 . Preliminary results using these ap- 
proaches suggest that prwncr-dihjcT is effectively reduced 
and it is possible to cetect the increase in Etfir fluores- 
cence in a PCR instigated by a single HIV genome in a 
background of 10* celts. With larger -numbers of cells, the 
background fluorescence contributed by genomic DNA 
becomes problematic. To reduce this background, it may 
be possible to use sequence -specific DNA*binding dyes 
that can be made to preferentially bind PCR product over 
genomic DNA by incorporating the dye-binding DNA 
sequence into the PCR product th rough a 5' "add-on" to 
the oUgonudcotidc primer 2 ' 1 . 

We nave shown that the detection of fluorescence 
generated by an £tBr-con taming PCR is straightforward, 
both once PCR is completed and continuously during 
ihcrmocyding. The ease with which automation of spe- 
cific DNA detection can be accomplished is the most 
promising aspect of this assay. Hie fluorescence analysis 
of completed PCRs is already possible with existing instm- 
mentauon in 9G-weJ! format**. In this format, the fluores- 
cence in each PCR can be cjuanritated before, after, and 
even at selected points during thermocycung by moving 
the rack of PCRs to a 9$-micro>vcJl plate fluorescence 
reader 20 . 

The instrumentation necessary to continuously monitor 
multiple PCRs simultaneously is also simple in principle. 
A direct c* tension of the apparatus used here is to have 
multiple fiberoprics transmit the excitation light and flu- 
orescent emissions to and from multiple PCRs. The ability 
to monitor multiple PCRs continuously may allow quan- 
titation of target DNA copy number. Figure 3 shows that 
the larger the amount of starting target DNA, the sooner 
during W.R a fluorescence incrca.se is detected. Prelimi- 
nary experiments <Hig*ichi and Dollinger, manuscript in 
preparation) with continuous monitoring have shown a 
sensitivity to two-fold differences in initial target DNA 
concentration. 

Conversely, if the number of target molecules is 
known — as n can be in genetic screening-reontinuous 
monitoring may provide a means pf detecting fabc posi- 
tive and false negative results. With a known number of 
target molecules, a true positive would exhibit detectable 
fluorescence by a predictable number of cydes of PCR. 
Increases in fluorescence detected before or after, that 
cycle would indicate potential artifacts* False -negative 
results due to t for example,, inhibition of DNA polymer- 
ase, may be detected by including within each PCR an 
inefficiently amplifying marker. This marker Tcsults in a 
fluorescence increase only after a large number of cy- 
cles — many more than are necessary do detect a true 



positive. If a saropJc fails to have a fluorescence increase 
after this many cycles, inhibition may be suspected. Since, 
in this assay, concHisions are drawn based on the presence 
or absence of fluorcAocnce signal alone, such controls may 
be important. In any event before any test based on this 
principle is ready for the clinic, an assessment of its false 
positive/false negative rates will need to be obtained using 
a large number of known samples. 

In summary, the inclusion m PCR of dyes whose fluo- 
rescence is enhanced upon binding dsDNA makes it 
possible to detect sped 6c DNA amplification from outside 
the PCR tube. In the future, instruments based upon this 
principle may facilitate the more widespread use of PCR, 
in applications that demand tht high throxtghput of 
samples* 

EXPERIMENTAL PROTOCOL 

Human HLA-DQn geae *mpHSouions containing EtBr. 
PCRs were set up in 100 uj volumes containing 10 mM Tris-HCh 
pH 8.3; 50 mM KCi; 4 raM MgC^: 2.5 units of tua DNA 
polymerase (PerVm*E}mcr Com, Norwalk* CT); 20 pinole each 
of human HlA-DQa gene specific oligonucleotide primers 
OH 2 6 and CH27 19 and approximately W copies of DQfr PCR 
product diluted from a previous reaction. Ethidium bromide 
(EtBr; Sigtw<> was used at the concentrations indicated in Figure 
2. Thcrmocyding proceeded for 20 cycles in a modd 4&0 
ObcrHMxyder (Perkm-EJsncr Ccw*, Norwalk, CT) using a "step- 
cycle" program of 94*C for 1 min. den* tu ration and 60*C for "30 
sec anncaSng and 72°C for 30 sec, e*tenaao», 

Y-chromo9cmic specific PCR. PCRs (?00 ul total reaction 
volume) containing pg/m) EtBr were prepared as described 
for HLA-DQc*, except with different primers and target DNAs. 
These PCRs contained 1 5 pm-ok each male DNA-specmc primes 
YM and Vi.2*°, and cither 60 ng male, 60 nfifemaie, 2 ng male. 
ot no human i>NA. Therrnocytlmg *3S £K 6 CTor 1 min- and 60?C 
for 1 min using a "rtep-cycle* pto flT A m. The number of cycles for 
a sample were as indicated in Flgui*e 3- Fluorescence measure- 
ment is described below. 

Ailck -specific, human ^globin grot PGR* Amplications of 
1 00 p-1 volume »tm£ 0 5 jtgAnl of ZtBr were prepared a$ 
described for HLA-JDQa above except with different primers and 
target DNAs. These PCRs contained either, primer pair HGP£/ 
H0MA f wfld^type globin specific primer*) or HGr2/HfJMS <»ick- 
lc-giobin specific primers) at 10 pmoJe $ach primer per PCR, 
These primers were developed by Wu ct aL 21 . Three different 
UCgei DNA a were u*ed in separate amplifications — 60 ng each of 
human DNA that was homozygous for the *icklc trait (5S)» DNA 
that was heternryroiw for the sickle traH (A$)» or DNA that vrtu 
homozygous for tfte w.l. globm (AA). ThcrmocYcErtg wa* for SO 
cycles at 94t: for 1 mm. and &5*C for 1 min. itang a ^p^yefe'' 
program. An anneaHng temperature of 55^0 bad been shown try 
Wu et al. 21 W provide aUcJc^pcrifk amplification- Completed 
PCRs vtere photographed through a red filter <WraUerr 23A) 
after placing the reaction tubes aton a model TM-36 trsmsiHucfti- 
natof <UV- products San- Gabriel, CA). 

Fhiorescence measnrexnctit. Flwore*cet>ee mcasurcmenw were 
made on PCRs containing EtBr in a Fluoro!og*2 0uorometcr 
(SFE3C Edison, NJ). Exciurtion was at the 500 nra band with 
about 2 nm bandwidth with a GG 435 nm cutoff ^utrjMcJIes 
Crist Inc. Irvine. CA> to exclude sc^nd-order light. E^itied 
light was detected at 5^0 nm with a bandwidth of about 7 nm. An 
OG 530 t»m ost-off filter was used to remove the cxdtauon hgftt 

Contintkom ftnorescence moxutormg of FCR. Continuous 
monitoring of a PCR in profirws *as accomplished using mc 
Bpcctrofiuorometcr and settings oescrtbod >bovc as well as a 
fiberoptic accessory (SPJ&X cat no. 1950) to both send excttauon 
light to. and receive emittRd light from, a PCR placed in a well cj 
a model 480 mr^mocydcr (Ptrkin-Elmer Cetus). The probe end 
of the fiberoptic cable was attached with "5 owutc'-cpoxy* to tg 
open top of a PCR cube (a 0.5 ml potvpropylene centrifuge tube 
with its cap removed) eReawely scaling k. The exposed top ^ 
the PCR mbc and the end of the fiberoptic cable were shielded 
from room light and the rooco lights were kept dimmed during 
each run- The monitored FOR was an amplihcauon off Y-ctjto- 
mosdmMpcdfk repeat pcovefcces as described above, except 
using an anncahng^extension tempcramrc of 50°C. The Tcactlf» n 
was covered with mineral oil (2 drops) to prevent evaporation. 
Tbcrruocyding and fiuorcscCDCC rricasuremcnt vere started si- 
multaneously, A uine-basc scau with a 10 second mtcgradoi> mnc 
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ttaa uw?d and ihc emission signal was ratioed to' ibc excitation 
nigrtft) to control for change* in Jifchtaourcc intensity. Datawcre 
Reeled wing the dnaSOOOf, version (SPEX) data systcni- 

Wc <ton.K Bob JoncA for help with the spcctrofluormctric 
innwur«mciiU and Hcalherhell Fong for editing this manuscript. 
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IMMUNO BIOLOGICAL LABORATORIES 



SCD-14EUSA 

Trauma, Shock and Sepsis 




The CD-14 molecule is expreased on the surface of 
monocytes and some macrophages. Membrane- 
. bound CD-14 is a receptor for lipopotysaccharide 
(LPS) complexed to LPS-Binding-Protein (LBP). The 
concentrailoo of its soluble form is altered under 
certain pathological conditions. There, is evidence for 
an important role of sCD-14.with polytrauma, sepsis, 
burnings and informations. 
During septic conditions and acute infections it seems 
to be a prognostic marker and is therefore of vatue in 
monitoring these patients. 



!BL offers an ELISA for quantitative determination of 

soluble CD-14 in human serum, -plasma, ceil-cuiture 

supewatants and other biological fluids. 

Assay features: 12x8 determinations 
(microliter strips), 
precoated with a specific 
monoctonal antibody, 
2x1 hour incubation, 
standard range: 3-96 ng/ml 
detection limit: 1 ng/ml 
CV: intra- and interassay < 8% 



For more information call or fax. 
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JMULTANEOUS AMPLIFICATION AND DETECTION I 
$P£CUK DNA SEQUENCES 

Kussell Higuchi*, Gavi* Bollinger 1 , P- Sean Walsh and Robert Griffith 

Roche Molecular Systems, Inc.. 1400 53rd St., Emeryvttk, CA 94C08. 'Chiron Corporation, 1400 53rd $U Emeryville, CA 
9460B, * Corresponding author. 



Wc have enhanced the polymerase chain 
reaction (PGR) such that specific DNA 
sequences can be detected without open- 
ing the reaction tube* This enhancement 
requires die addition of ethidium bromide 
(EtBr) to a FCR. Since the fluorescence of 
EtBr increases in the presence of double* 
stranded (ds) DNA an increase in fluores- 
cence in such a PGR indicates a positive 
amplification, which can be easily moni- 
tored externally. In fact, amplification can 
be continuously monitored in order to 
follow its progress. The ability to simulta- 
neously amplify specific DNA sequences 
and detect the product of the amplification 
both simplifies and improves PCR and 
may facilitate its automation and more 
widespread use in the clinic or in other 
situations requiring high sample through- 
put 

Although the potential benefits of PCR 1 to clin- 
ical diagnostics arc wctl known, 2,5 , it is still not 
widely used in this setting, even though it is 
four year* emco thcrrw?i**blft DNA potym^t-- 
ase« 4 made PCR practical. Some of the reasons foT it* stow 
Acceptance are high cost, tack of automation of pre- and 
post-PCR processing steps, and false positive results, from 
orryoveT-conuniination. The first two points arc related 
in that labor is the largest contributor to cost at the present 
stage of PCR development. Most current assays require 
some form of "downstream" processing once uSermocy- 
ding is done in order to determine whether the target 
DNA sequence was present and has amplified. These 
include DNA hybridisation 5 - 6 , gel eJectrophoreAts with or 
without use of restriction digestion 7 ^ HPkC 9 , or capillary 
electrophoresis 10 . These methods are labor -intense, have 
low throughput, and arc difficult to automate. The third 
point is abo clo-wry related to downstream processing. 
The handling of the PCR product in these downstream 
processes increases the chances that amplified DNA' will 
spread through the taping lab, resulting in a risk of 



'carryover" false positives in subsequent testing 11 . 

These downstream processing steps would be elimi- 
nated rf specific amplification and detection of amplified 
DNA took place simultaneously within an unopentd re- 
action vessel Assays in which such different processes take 
>lace without, the need to separate reaction components 
tavc been termed homogeneous"'- No truly homoge- 
neous PCR assay has been demonstrated to date, although 
progress towards this end has been reported.- Chehab, et 
a!.™, developed a PCR product detection scheme using 
fluorescent primers that resulted in a fluorescent PCR 
product A lick-specific primers, each with different fluo- 
rescent tags, were used to indicate the genotype of the 
DNA. However, the unincorporated primers must still be 
removed in a downstream process in order to visualize the 
result- Recently, HoUahd, et al> l? , developed an assay^in 
which the endogenous 5 r exonudease assay of Taq DNA 
polymerase was exploited to cleave a labeled oJigonucJeo- 
[kle probe. The probe would only dcave if PCR am pirn- 
cation had produced its complementary sequence. In 
order to detect the dcavage prcducts however, a subse- 
quent process w again needed. . 

Wc have developed a truly homogeneous assay for PCR 
and PCR product detection based upon the gready in- 
creased fluorescence that ethidium bromide and other 
DNA binding dyes exhibit when they are bound to ds- 
DNA t4_ie . As outlined in Figure 1, a proiotypk PCR 
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nOVKE 1 Principle of simultaneous amplificadon and dctea tOA of 
PCR product: The coinponmts of a PCR coowinmg Et3r that arc 
fluorescent arehsied--£tBr itseJf, EtBr bound toother esDNA ot 
daPN A. There vt a large fluorescence enhancement when EtBr Is 
bound to DNA and hmding is greatly enhanced when DNA is 
douhk-stranded. After sufficient <n)..cydcs of PGR* the .net 
increase m ds.DWA results in additional EtBr banding, and a net 
increase in total 'fluorescence: 
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7 Gel electrophoresis of PCR amplification prodwts of the 
human, ttwefcar gene, HLA DQ&, made in the presence of 
increasing amounts of EtBr (up lo 5 ng/ml). The presence of 
£t»r Iww no obvious effect on she yield or specificity of amplifi- 
cation. 



A. 




B. 




nCOK > (A) fluorescence measurement* from PCRs that contain 
0.5 ^g/m3 EtBr and that ate specific for Y-chrotnosOnx: repeat 
Aeouence*. Five replicate PCRs **ere begun containing each of the 
DNAs specified. At each indicated cycle, one of the five replicate 
PCfts for each DNA -was removed from thcrmocydrng and Hs 
fluorescence measured. Uniu of fluorescence Are arbitrary. (B) 
UV photography of PCRtube* (0,5 ml Eppcndorf-stylc, polypro- 
pylene micro-centrifuge tubes) containing reactions, those starts 
ing from 2 ng male DNA And control reactions without any DNA, 
from (Ay 



begins with primers that arc tingle-stranded DNA {$s~ 
DNA), dNTPs, and DNA polymerase; An amount of 
dsDNA containing the target sequence (target DNA) is 
also typically present. This amount can vary, depending 
on the application, from single-cell amounts of DNA 17 to 
micrograms per PCR^ e , If EtBr is present, the reagent* 
that will fluoresce, in order of increasing fluorescence, are 
free EtBr itself, and EtBr bound to the single-stranded 
DNA primers ami to the double-stranded target DNA (by 
its intercalation between the stacked bases of the DNA 
doublc-hcnx). After the first denatu ration cyde, target 
DNA will be largely single-stranded. After a PGR is 
completed, the most significant change is the increase in 
the amount of dsDNA (the PGR product itself) of up to 
several micrograms. Formerly free EtBr is bound to the 
additional dsDNA* resulting in an increase m fluores- 
cence. There is also some decrease in the amount of 
ssDNA primer, but because the binding of EtBr to ssDN A 
is much less than to dsDNA, the effect of this change on 
the total fluorescence of the sample is small. The fluores- 
cence increase can be measured by directing excitation 
illumination through the walls of the amplification vessel 
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bctbrc and after, or even continuously during, thermocy. 
ding. ' 



RESULTS 

PGR in fiie presence of EtBr. In order to* assess tht 
affect of EtBr in PGR, amplifications of the human Hl*A 
DQa gene >9 were performed with the dye present at 
concentrations from 0.06 to 8.0 iLgfrtH (a typical concea. 
tration of EtBr used tn staining of nucleic aads following 
get electrophoresis is 0*5 p-g^mF)- As shown in Figure 2, gel 
electrophoresis revealed little or no difference in the yield 
or quality of the amplification product whether EtBr was 
absent or present at any of these concentration $, indicat- 
ing that EtBr does not inhibit PGR, 

Detection of human Y-chromowizKi specific 
cpences. Sequence-specific, fluorescence enhancement of 
EtBr as a result of PGR was demonstrated in a series of 
amplificadom containing 0.5 u.g/ml EtBr and primer* 
specific to repeat DNA sequences found on the human 
V-chromosomc* 0 . These PCRs initially contained cither 
60 ng male, 60 ng female, 2 ng roak human or no DNA. 
Five replkate PCRs were begun for each DNA* After 0, 
17, 21, 24 and 29 cycles of thermocyding, a PCR for each 
DNA was removed from the thermoeyder, and its fluo- 
rescence measured in a spectrofraorometer and plotted 
vs. amplification cyde number (Fig. 3A). The shape of this 
curve reflects the fact that by the time an increase in 
fluorescence can be detected, the increase in DNA is 
becoming linear and not exponential with cyde number. 
As shown, the fluorescence increased about three-foki 
over the background fluorescence for the PCRs contain* 
ing human male DNA, but did not significantly increase 
for negative control PCRs, which contained either no 
DNA or human female DNA. The mote male DNA 
present to begin with — 60 ng versus 2 ng— the fewer 
cycles were needed to give a detectable increase in fluo- 
rescence. Gel electrophoresis on the products of these 
amplifications showed that DNA fragments of the ex- 
pected sue were made in the male DNA containing 
reactions and that little DN A synthesis took place to the 
control samples. 

In addition, the increase in. fluorescence was visualized 
by simply laying the completed, unopened PCRs on a UV 
transilhzminator and photographing them through a red' 
filter. This is shown in figure SB lor the reactions thai 
began with 2 ng male DNA and those with no DNA* 

Detection of specific allele* of the human p-globm 
gene. In order to demonstrate that this approach has 
adequate specificity to allow genetic screening, a detection 
of the sickfc-ccll anemia mutation was performed* Figure 
4 shows the fluorescence from completed axapMcatio*» 

containing EtBr (0.5 i^g/ml) a* detected by photography 
of the reaction tubes on a UV transiUuminator, These 
reactions were performed using primer* specific for ci- 
ther the wild-type or sickle-ceil mutation of the human 
p^glpbin gene . The specificity for each allele is imparted 
by placing the sickje-niutaiion site at the terminal 3' 
nucleotide of one primer. By using an appropriate primer 
annealing temperature, primer extension—and thus am- 
plifkatjpn — can take place only if the 3' nucleotide of the 
primer i$ complementary to the 0-gjk)bin allele prcAcnt*'^ 2 . 

Each pair of amplications shown in Figure 4 consists of 
a reaction with etcher the wiJdHypc allele specific 0<# 
tube) or stcklc-aUele specific (right tube) primers. Three 
different DNAs were typed: DNA from a homozygous, 
wild-type p-globin individual (A A); from a heterozygous 
sickle pMgjpbin individual (AS); and from a homozygous 
sickle p-globin individual (SS). Each DNA (50 ng genomic 
DNA to start each PGR) was analyzed in triplicate (3 pail* 
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0 f reactions each). The DNA .type vas reflected in the ' 
relative fluorescence intensities in each pair of completed 
^pllficatioiifi. There was a significant increase in fluores- 
ce n oe only where a ^globin allele DNA matched the 
rjjpcr »ct. When measured va a spcxtrofluoronictcr 
Mata not shown), this fluorescence was about three tidies 
j^t present in a PGR where both p-globin alleles were 
jyiisinatched to the wrimer set. Gel cfcarophoreffc (not 
Jhown) established that this increase in fluorescence was 
due w the synthesis of nearly a microgram of a DNA 
fragment of the expected size for p^globin. There was 
litdc synthesis of dsDNA in reactions in . which the aHele- 
urtdfic primer was mismatched to both alleles* 

Continuous niooitoriog of a PGR. Using a fiber optk 
devker it is possible to direct excitation illumination from 
n spectrofluorometer to a PCR undergoing thcrmocycling 
and to return its fluorescence to the Kpcctroflnorometcr. 
The fluorescence readout of such an arrangement, di- 
rected at an EtBr*containing amplification of Y-chromo- 
sorac specific scqveoccs from 25 ng of human male DNA» 
is shown in Figure 5. The readout from a control ICR 
vjih no target DNA is also shown. Thirty cycles of PCR 
^erc monitored for each- 

The fluorescence trace as a function of time ckarly 
shows the effect of the thermocyding. Fluorescence inten- 
sity rises and £alb inversely with temperature. The fluo- 
rescence intensity is minimum at the dena titration tem- 
perature (94°C) and maximum at the anncalin ^extension 
temperature (50°C). In the negative-control PGR, these 
fluorescence maxima and minima do not change signifj- 
<*Tuly over the thirty tbcrmocyck*, indicating that there is 
little dsDNA synthesis without the appropriate target 
DNA, and there is little if any b!each3ng of EtBr during 
the continuous illumination of the sample. 

In the PCR containing male DNA, the fluorescence 
maxima at the annealing/extension temperature begin to 
increase at about 4000 seconds of therroocycling, and 
continue to increase with rime, indicating that dsDNA is 
being produced at a detectable level. Note that the fluo- 
rescence minima at the denaturation tcmpcrarure do not 
significantly increase, presumably because ai this temper- 
ature thcTc is no dsDNA for EtBr to bind- Hius the course 
of the amplification is followed by tracking the fluorcs-. 
cence increase at the annealing temperature. Analysis of 
ihc products of these two amplifications by gel electropho- 
resis showed a DNA fragment of the eacpeetcd size for the 
male DNA containing sample and no detectable DNA 
synthesis for the control sampte. 

DISCUSSION 

Downstream processes such a* hybridization to a se- 
quence-specific probe can enhance die specificity of DNA 
detection by FCR. The cHmuvukm vf ihcac processes 
means that' the specificity of this homogeneous assay 
depends solely on that of PCtL In the case of sickle-celi 
disease, we have shown that PGR alone has sufficient DNA 
sequence specificity to permit genetic screening. Using 
appropriate amplification conditions, there is little non- 
specific production of dsDNA in the absence of the 
appropriate target allele. 

The specificity required to detect pathogens can be 
more or less than that required to do genetic screening, 
depending on the number of pathogens in the sample and 
the amount of other DNA that, must be talcen with the 
sample. A difficult target is HIV, which TcquiTcs detection 
of a viral genome that can be at the level of a few copies 
per thousands of host cells*. Compared with generic 
screening, which is performed on cells sontainkig at least 
one copy of die target sequence* HIV "detection requires 
both more specificity and the input of more total 
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BWtt 4 UV photography of PCR tube* containing JunpBficatrons 
using EtBr urn are specific to viW-typc (A) or sieWc (S) alleles of 
the human £-globin gene. The left of each pair of tubes Contains 
aBele-specifie primers to the wild-type alleles, the right tube 
prixaen to the sicWe aJttek. The photograph was tafceh after &G 
cycles of PCR, and the input DNAs and the alleles they contain 
arc indicated. iFlfty tog of DNA was used to begin PGR. Typmg 
was done in triplicate (3 pairs of PCRs) for each input DNA: 
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ROWS 5 Continuous, real-time monitoring of a PCR, A fiber optk 
was oscd to carry excitation ligfet to a PUR m progress and also 
ewHtfid light back to a flooromctcr (bcc E*oentncntal Pr^ocoJ). 
AmpHficaSoD using human maJo-DN A specific onmcrs in a PCR 
Starting with 20 ng of human male DNA (top), ^or m a control 
PCR without DNA. (bottom), were monitored, cydo of 

PCR were folWed for each, The temperature CycW between 
94X (denaturation) and MrX Groneabiig and extenwon}. Note m 
the male DNA PCR. the cycle (time) dependent increase m 
fluorescence at the aoneafing/extension temperature. 
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DNA-— up to iuicrogram amoun&-Hii order to have suf- j 
ficicnt number 6 of target sequences. This large amount of \ 
starting DNA m an amplification sigm&cantly increases 
the background fluorescence over which any additional 
fluorescence produced by PGR niust be detected. An 
additional complication that occur* with targets m tow 
copy-number is the formation of the **primer-dimer n 
artifact. This is the result of the extension of one primer 
using the other primer a.? a template. Although this occurs 
infrequently, once it occurs the extension product is a 
substrate for PCR amplification, and can compete with 
true PGR targets if those targets are rare. The primer- 
dimer product is of course dsDNA and thus is a potential 
source of false signal in this homogeneous assay, 

To increase PGR specificity and reduce the effect of 
primer*dimcT amplification, we are investigating a num- 
ber of approaches, including the use of DCited-primer 
amplifications that take place in a sanglc tube 3 , and the 
"hot -start", in which nonspecific amplification is reduced 
by raising the temperature of the reaction before DNA 
synthesis begins 2 *. Prelhninary resuks using these ap- 
proaches suggest that j>r«ner-dijn:cT is effectively reduced 
and it is possible to detect the increase in EtBr fluores- 
cence in a PGR instigated by a single HIV genome in a 
background of 10* ceils. With larger numbers of ccfls, the 
background fluorescence contributed by genomic DNA 
becomes problematic. To reduce th3s background, it may 
be possible to use sequence -specific DNA4>inding dyes 
that can be made to preferentially bind PGR product over 
genomic DNA by incorporating the dye-binding DNA 
sequence into the PGR product through a 5' "add-on" to , 
the oligonudeotidc primer* 1 . 

We nave shown that the detection of fluorescence 
generated by an EtBr-containing PGR is straightforward, 
both once PCR is completed and continuously during 
thermocyding. The ease with which automation of spe- 
cific DNA detection can be accomplished is the most 
promising aspect of this assay. The Huorescenee analysis 
of completed PCRs is already possible with existing instru- 
mentation in 96-well format** In tins format, the fluores- 
cence in each PGR can be cjuantitated before, after, and 
even at selected points durmg thermocyciirig by moving 
the rack of PCRs to a ^microwcll plate fluorescence 

reader 4 *. ... 

The instrumentation necessary to continuously rnonrtor 
multiple PCRs simultaneously is also simple in principle. 
A direct c* tension of the apparatus used here is to have 
multiple fiberopdes transmit the excitation light and flu- 
orescent emissions to and from multiple PCRs. The ability 
to monitor multiple PCRs continuously may allow quan- 
titation of target DNA copy number. Figure 5 shows that 
the larger the amount of starting target DNA, the sooner 
during Pf*:R a fluorescence increase is detected. Prelimi- 
nary experiments <Higuchi and Dotlinger, manuscript in 
preparation) with continuous monitoring have shown a 
sensitivity to two-fold differences in initial target DNA 
concentration. 

Conversely, if the number of target molecules is 
known — as n can be in genetic screening-rcontinuous 
monitoring may provide a means of detecting false posi- 
tive and false negative results. With a known number of 
target molecules, a true positive would exhibit detectable 
fluorescence by a predictable number of cycles of PGR* 
Increases in fluorescence detected before or after that 
cycle would indicate potential artifacts* False negative 
results due to, for example,. inhibition of DNA polymer- 
ase, may be detected by including within each PCR an 
inefficiently amplifying marker. This marker results in a 
fluorescence increase only after a large number of cy- 
cles-— many more ' than arc necessary to detect a true 



positive. If a sample fails to have a fluorescence increase 
alter this many cycles, inhibition may be suspected. Since, 
in this assay, conclusions are drawn based on the presence 
or absence of fluorescence signal alone, such controls rnay 
be important. In any event, before any test based on this 
principle is ready for the clinic, an assessment of its false 
positive/false negative rates will need to be obtained using 
a large number of known samples. 

In summary, the inclusion m PGR of dyes whose fluo- 
rescence is enhanced upon binding dsDNA makes it 
possible to detect specific DNA amplification from outside 
the PCR tube. In the future, instruments based upon this 
principle may facilitate the more widespread use of PGR 
in applications that demand the high throughput of 
samples* 

EXPERIMENTAL PROTOCOL 

Human HLA-DQn gen* ampHfieaUons cmitainiog EtBr. 
PCRs were set up in 100 \>X volumes containing 10 mM Tris-HCh 
pH 8.3; 50 mM KCI; 4 raM MgCJ 2 : $£ units of Taa DNA 
polymerase (PerVm*£)ittcr Cctu*. Norwalk. CT); iO pinole each 
of human HtA-DQa ' gene specific oligonucleotide primers 
OH2& and CH27 19 and approximately 1<T copies of DQ6 PCK 
product dihrtcd from a previous reaction. EuncHum bromide 
(EtBr, StgttvO was used at the concentrations indicated in Figure 
2. T^erenocyding proceeded for 20 cycles in a model 460 
thermocyder (Perfcm-Elmer Ccui*, Norwalk, CT) using a "stcp- 
cyelc" program of 94*C for 2 mm, dcnaluralion and WrC for 30 
sec annealing and 72°C for 30 sec. extension 

Y-chromosmnc specific PCR. PCRs (J 00 ul total reaction 
volume) containing t)Jf> pg/ittl EtBr were prepared as described 
for HLA-DQo, except with different primers and target DNAs. 
These PCRs contained J 5 pmolc each male DN A-spccific primes 
YI . 3 and V 1.2*°, arid cither 60 ng male, €0 og female, 2 ng male, 
or no human DNA. Thermocycling was 94 6 C Tor 1 min- and $&C 
for J min using a "stcp<Ycle* ptograro. The number of cycles for 
a sample were as indicated in Figui* 3. Fluorescence measure- 
ment is described below. t 

Allck-spccific, human £-£k*in gcoe PCR, Amplifications of 
100 jlI volume using 0 5 M€ /ml ^ tBr wcrc prepaid ^ 
described tor HLA-DQa above except with diScrent primers and 
target DNAs. These PCRs contained either, primer pair HOPS' 
HpHA <wfld-type globin sped&c primers) or HGP2/Hfm$ <skk- 
lc-ffiobin specific primers) at 10 pmole &ch i primer per PCR. 
These primers were developed by Wu ct aL 21 . Three different 
tacgei DNA a wett u*cd in separate a mptifkations- — 50 ng each of 
human DNA that was homozygous for the sfrUk trait <SS). DMA 
that was heteroryKou* for the steMe van (AS), or DNA that vrtu 
homozygous for the w.t. giobm (AA). Thcrmocycfing w^s far 30 
cycles at 94"C for 1 min. and S5 e C for 1 min. usu>« n *^^>cYcfe , ' 
program. An annealing temperature of 55^ had hcen shown try 
Wu et al, 21 to provide aUc!c*pcrifk atrtplifteatiori. 0>mplctcd 
PCRs were photographed through a red fitter <WraUcnJi3A) 
after placing the reaction tubes atop a model TM*S6 IransffiutfU- 
nator <tiV-products Sart Gabnel, CX). 

Fluorescence measurement. Fluorescence TTjcasurcments wet* 
made on PCRs containing EtBr in a Fluorotog-2 fltiorornCttr 
(SPEX. Edison, NJ). Excitation was at the 500 nm band wjUi 
about 2 nm bandwidth with a GG 435 nm cut-off .^J^'f! 
Crist Inc.* Irvine. CA) to exclude second-order light. Emitteo 
light was detected at 5 Vt) nm with a bandwidth of about 7 nrn. A" 
OG 530 pm cut-off filter Was used to remove the excitation hgftt 
Cuna'tHKKiA rhioretcence moiiitoting of PCR. Coi>unuoui 
monitoring of a PCR in profiros was accomplished using mC 
Bpcctxofluorometer and serdngB descrtbod Above as well as a 
fiberoptic accessory (SPEX cat no. 1950J to both send excitation 
light to. and receive emitted Ught from, a PCR placed m a well 
a model 480 mw-mocydcr (Pcrkm-Elmer Cetus). The probe ena 
of the rtbcroptk cable was attached with "5 mrn utc-cpoxy* to t&fe 
open top of a PCR tube (a 0.5 ml poiypropyienc centrifuge rube 
vAth its cap removed) effectively acahcg ^ The exposed top m 
the PCR tube and the end of the fiberoptic cable; wcrc jjiueWca 
from room light and the room light* were kept dunmed durmg 
each run- The monitored PCR was an aoipWcaudn of y^cbio- 
mosomc-spednc repeat sequences as described above, except 
using an anncatingfcxtension lemperauirc of 5(TC. The reaction 
was Covered with x»ij*er*J oil (2 drops) to prevent tvaporauon- 
Tbcrmocyclmg and fluorescence measurement were started si^ 
multancously. A time-base scan witn a 10 second integranoii tnnc 
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uaa uwid And ihc emission Stgna] was ratioed to' the excitation 
wRrtitl to control for chiWgct in Ji£ht**ourcc intensity, were 
fleeted using the dn>3O0Of, version 2.5 (SPEX) data system. 

Wc tlnnk Bob Jones For help with the spectroftoormeiric 
<n«/utf foments andHcaiherbeU Fony for editing this manuscript. 
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sCD-14 EUSA 

Trauma, Shock and Sepsis 



The CD-14 molecule is' -expressed on the surface of 
monocytes and some macrophages. Membrane- 
bound CD-14 is a receptor for I ipopoly saccharide 
(LPS) completed to LPS-Binding-Protein (LBP). The 
concentration of its soluble form is aftered under 
certain pattologtoa! conditions. There- is evidence for 
an important role of $CD-14.with pofytrauma. sepsis. 
burning$ and inflammations. 
During septic conditions and acute infections it seems 
to be a prognostic marker and is therefore of value in 
monftoring these patients. 



1BL offers an ELISA for quantitative determination of 

soluble CD-14 in human serum, -pfasma, cell-cuiture 

supernatants and other biological fluids. 

Assay features: 12x8 determinations 
(microliter strips), 
precoated with a specific 
monoclonal antibody, 
2x1 hour incubation, 
standard range: 3-96 ng/rnj 
detection limit: 1 ng/ml 
CV: intra- and interassay < 8% 
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Oligonucleotides with Fluorescent Dyes at 
Opposite Ends Provide a Quenched Probe 
System Useful for Detecting PCR Product 
and Nucleic ASJ 




Kenneth J. Livak, Susan J.A, Flood, Jeffrey Marmaro, William Giusti, and Karin Deetz 

Perkin-Elmer, Applied Biosystems Division, Foster City, California 94404 



The 5' nuclease PCR assay detects the 
accumulation of specific PCR product 
by hybridization and cleavage of a 
double-labeled fluorogenlc probe 
during the amplification reaction. 
The probe Is an oligonucleotide with 
both a reporter fluorescent dye and a 
quencher dye attached. An Increase 
fn reporter fluorescence Intensity In- 
dicates that the probe has hybridized 
to the target PCR product and has 
been cleaved by the 5' nucle- 
olytlc activity of Taq DNA polymerase. 
In this study, probes with the 
quencher dye attached to an Internal 
nucleotide were compared with 
probes with the quencher dye at- 
tached to the 3-end nucleotide. In all 
cases, the reporter dye was attached 
to the 5' end. All Intact probes 
showed quenching of the reporter 
fluorescence. In general, probes with 
the quencher dye attached to the 3'- 
end nucleotide exhibited a larger sig- 
nal In the 5' nuclease PCR assay than 
the Internally labeled probes. It is 
proposed that the larger signal Is 
caused by Increased likelihood of 
cleavage by Taq DNA polymerase 
when the probe Is hybridized to a 
template strand during PCR. Probes 
with the quencher dye attached to 
the 3 '-end nucleotide also exhibited 
an increase in reporter fluorescence 
Intensity when hybridized to a com- 
plementary strand. Thus, oligonucle- 
otides with reporter and quencher 
dyes attached at opposite ends can 
be used as homogeneous hybridiza- 
tion probes. 



#^ homogeneous assay for detecting 
the accumulation of specific PCR prod- 
uct that uses a double-labeled fluoro- 
genic probe was described by Lee et al. (1) 
The assay exploits the 5'-* 3' nucle- 
olytlc activity of Taq DNA poly- 
merase (2 * 3) and is diagramed in Figure 1. 
The fluorogenic probe consists of an oli- 
gonucleotide with a reporter fluorescent 
dye, such as a fluorescein, attached to 
the 5' end; and a quencher dye, such as a 
rhodamine, attached internally. When 
the fluorescein is excited by irradiation, 
its fluorescent emission will be 
quenched if the rhodamine is close 
enough to be excited through the pro- 
cess of fluorescence energy transfer 
(FET). (4 - 5) During PCR, if the probe is hy- 
bridized to a template strand, Tag DNA 
polymerase will cleave the probe be- 
cause of its inherent 5' 3' nucleolytic 
activity. If the cleavage occurs between 
the fluorescein and rhodamine dyes, it 
causes an increase in fluorescein fluores- 
cence intensity because the fluorescein 
is no longer quenched. The increase in 
fluorescein fluorescence intensity indi- 
cates that the probe-specific PCR product 
has been generated. Thus, FET between a 
reporter dye and a quencher dye is criti- 
cal to the performance of the probe in 
the 5' nuclease PCR assay. 

Quenching is completely dependent 
on the physical proximity of the two 
dyes. (6) Because of this, it has been as- 
sumed that the quencher dye must be 
attached near the 5' end. Surprisingly, 
we have found that attaching a rho- 
damine dye at the 3' end of a probe 
still provides adequate quenching for 
the probe to perform in the 5' nuclease 



PCR assay. Furthermore, cleavage of this 
type of probe is not required to achieve 
some reduction in quenching. Oligonu- 
cleotides with a reporter dye on the 5' 
end and a quencher dye on the 3' end 
exhibit a much higher reporter fluores- 
cence when double-stranded as com- 
pared with single-stranded. This should 
make it possible to use this type of dou- 
ble-labeled probe for homogeneous de- 
tection of nucleic acid hybridization. 



MATERIALS AND METHODS 
Oligonucleotides 

Table 1 shows the nucleotide sequence 
of the oligonucleotides used in this 
study. Linker arm nucleotide (LAN) 
phosphoramidite was obtained from 
Glen Research. The standard DNA phos- 
phoramidites, 6-carboxyfluorescein (6- 
FAM) phosphoramidite, 6-carboxytet- 
ramethylrhodamine succinimidyl ester 
(TAMRA NHS ester), and Phosphalink 
for attaching a 3'-blocking phosphate, 
were obtained from Perkin-Elmer, Ap- 
plied Biosystems Division. Oligonucle- 
otide synthesis was performed using an 
ABI model 394 DNA synthesizer (Applied 
Biosystems). Primer and complement 
oligonucleotides were purified using 
Oligo Purification Cartridges (Applied 
Biosystems). Double-labeled probes were 
synthesized with 6-FAM-labeIed phos- 
phoramidite at the 5' end, LAN replacing 
one of the Ts in the sequence, and Phos- 
phalink at the 3' end. Following de- 
protection and ethanol precipitation, 
TAMRA NHS ester was coupled to the 
LAN-containing oligonucleotide in 250 
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FIGURE 1 Diagram of 5' nuclease assay. Stepwise representation of the 5' - 3' nudeoryticac- 
S of ' ^DNA^ymerase acting on a fluc-rogenic probe during one extension phase of POL 



mM Na-bicarbonate buffer (pH 9.0) at 
room temperature. Unreacted dye was 
removed by passage over a PD-10 Sepha- 
dex column. Finally, the double-labeled 
probe was purified by preparative high- 
performance liquid chromatography 
(HPLC) using an Aquapore C 8 220x4.6- 
mrn column with 7-jun particle size. The 
column was developed with a 24-min 
linear gradient of 8-20% acetonitrile in 
0.1 m TEAA (triethylamine acetate). 
Probes are named by designating the se- 
quence from Table 1 and the position of 
the IAN-TAMRA moiety. For example, 
probe Al-7 has sequence Al with LAN- 
TAMRA at nucleotide position 7 from the 
5' end. 



PCR System* 

All PGR amplifications were performed 
in the Perkin-Elmer GeneAmp PCR Sys- 
tem 9600 using 50-uJ reactions that con- 
tained 10 mM Tris-HCl (pH 8.3), 50 mM 
KC1, 200 u-M dATP, 200 dCTP, 200 \im 
dGTP, 400 vM dUTP, 0.5 unit of AmpEr- 
ase uracil N-glycosylase (Perkin-Elmer), 
and 1.25 unit of AmpHTaq DNA poly- 
merase (Perkin-Elmer). A 295-bp seg- 
ment from exon 3 of the human p-actin 



gene (nucleotides 2141-2435 in the se- 
quence of Nakajlma-ll|ima et al.) (7) was 
amplified using primers AFP and ARP 
(Table 1), which are modified slightly 
from those of du Breuil et al. (B> Actin am- 
plification reactions contained 4 mM 
MgC! 2 , 20 ng of human genomic DNA, 
SO nM Al or A3 probe, and 300 nu each 



primer. The thermal regimen was 50°C 
(2 min), 95°C (10 min), 40 cycles of 95°C 
(20 sec) r 60°C (1 min), and hold at 72°C. 
A 515-bp segment was amplified from a 
plasmid that consists of a segment of X 
DNA (nucleotides 32,220-32,747) in- 
serted in the Smal site of vector pUC119. 
These reactions contained 3.5 mM 
MgCl^ 1 ng of plasmid DNA, 50 nM P2 or 
P5 probe, 200 nM primer F119, and 200 
nM primer R119. The thermal regimen 
was 50°C (2 min), 95°C (10 min), 25 cy- 
cles of 95°C (20 sec), 57°C (1 min), and 
hold at 72°C 



Fluorescence Detection 

For each amplification reaction, a 40-uJ 
aliquot of a sample was transferred to an 
individual well of a white, 96-weli micro- 
titer plate (Perkin-Elmer). Fluorescence 
was measured on the Perkin-Elmer Taq- 
Man LS-50B System, which consists of a 
luminescence spectrometer with plate 
reader assembly, a 485-nm excitation fil- 
ter, and a 515-nm emission filter. Excita- 
tion was at 488 nm using a 5-nm slit 
width. Emission was measured at 518 
nm for 6-FAM (the reporter or R value) 
and 582 nm forTAMRA (the quencher or 
Q. value) using a 10-nm slit width: To 
determine the increase in reporter emis- 
sion that is caused by cleavage of the 
probe during PCR, three normalizations 
are applied to the raw emission data. 
First, emission intensity of a buffer blank 
Is subtracted for each wavelength. Sec- 
ond, emission intensity of the reporter is 



TABLE 1 Sequences of Oligonucleotides 



Name 


Type 


F119 


primer 


R119 


primer 


P2 


probe 


P2C 


complement 


PS 


probe 


P5C 


complement 


AFP 


primer 


ARP 


primer 


Al 


probe 


A1C 


complement 


A3 


probe 


A3C 


complement 



Sequence 



ACCCACAGGAACTGATCACCACTC 

ATGTCG CGTTCCGGCTG A CGTTCTG C 

TCGCATTACTGATCGTiGCCAACCAGTp 

GTACTGGTTGGCAACGATCAGTAATGCGATG 

CGGA'ITTGCTGGTATCTATGACAAGGATp 

TTCATCCTTGTCATAGATACCAGCAAATCCG 

TCACCCACACTGTGCCCATCTACGA 

CAGCGG AACCGCTCATTG CCAATGG 

ATGCCCTCCCCCATGCCATCCTGCGlp 

AGACGCAGGATGGCATGGGGGAGGGCATAC 

CGCCCTGGACTTCGAGCAAGAGATp 
CCATCTCTTGCTCGAAGTCCAGGGCGAC 



For each oligonucleotide used in this study, the nucleic add sequence * 
? - 3' direction. There are three types of oligonucleotides: PGR pnmer, fluorogenlc pn Dbeu ea 
in^e S' nuc ease assay, and complement used to hybridize to the corresponding probe. Pot -W 
indicates a position where IAN with TAMRA attached was subset 
tuted for a T. (p) The presence of a 3' phosphate on each probe. 
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A1 -2 lUQGCCCTCCCCCATGCCATCCTGCGTp 

A1 -7 RATGCCCQC'CCCCATGCCATCCTGCGTp 

A1 -1 4 RATGCCCTCCCCCAQGCCATCCTGCGTp 

A1 -1 9 RatgocctcccccatgccaQcctgcgtp 

A1 -22 R^TGCCCTCCCCCATGCCATCCQGCGTp 

A1-26 RATGCCCTCCCCCATGCCATCCTGCGQp 



Probe 


518 


nm 


582 nm 


RQ- 


RQ+ 


ARQ 




no temp. 


♦ temp. 


no temp. 


+ temp. 








A 1-2 


25.5 i 2.1 


32.7 ±1.9 


38.2 ± 3.0 


38.2 ±2.0 


0.67 + 0.01 


0.86 ±0.06 


0.19 ±0.06 


A1-7 


53.5 + 4.3 


395.1 ±21.4 


108.5 ±6.3 


11 0.3 ±5.3 


0.49 + 0.03 


3.58 ±0.17 


3.09 ±0.18 


AV14 


127.0±4.9 


403.5 ±19.1 


108.7 ±53 


93.1 ±6.3 


1.16 ±0.02 


4.34 ±0.15 


3.18 ±0.15 


A1-19 


187.5117.9 


422.7 ±7.7 


70.3 ±7.4 


73.0 ±2.8 


2.67 ±005 


5.80 ±0.15 


3.13 ±0.16 


A 1-22 


224.6 ±9.4 


482.2 ± 43.6 


100.0 + 4.0 


96.2 ± 9.6 


2.25 ± 0.03 


5.02 + 0.11 


2.77±0.12 


A1-26 


160.2 ±8.9 


454.1 ±16.4 


93.1 ±5.4 


90.7 ±3.2 


1.72 ±0.02 


5.01+0.08 


3.29 ±0.08 



FIGURE 2 Results of 5' nuclease assay comparing p-actin probes with TAMRA at different nucle- 
otide positions. As described in Materials and Methods, PGR amplifications containing the in- 
dicated probes were performed, and the fluorescence emission was measured at 518 and 582 nm. 
Reported values aTe the average ±1 s.D. for six reactions run without added template (no temp.) 
and six reactions run with template (+temp.). The RQ ratio was calculated for each individual 
reaction and averaged to give the reported RQ" and RQ + values. 



divided by the emission intensity of the 
quencher to give an RQ ratio for each 
reaction tube. This normalizes for well- 
to-weil variations in probe concentra- 
tion and fluorescence measurement. Fi- 
nally, ARQ is calculated by subtracting 
the RQ value of the no-template control 
(RQ~) from the RQ value for the com- 
plete reaction including template 
(RQ + ). 

RESULTS 

A series of probes with increasing dis- 
tances between the fluorescein reporter 
and rhodamine quencher were tested to 
investigate the minimum and maximum 
spacing that would give an acceptable 
performance in the 5' nuclease PCR as- 
say. These probes hybridize to a target 



sequence in the human p-actin gene. 
Figure 2 shows the results of an experi- 
ment in which these probes were in- 
cluded in PCR that amplified a segment 
of the p-actin gene containing the target 
sequence. Performance in the S' nu- 
clease PCR assay is monitored by the 
magnitude of ARQ, which is a measure 
of the increase in reporter fluorescence 
caused by PCR amplification of the 
probe target. Probe Al-2 has a ARQ value 
that is close to zero, indicating that the 
probe was not cleaved appreciably dur- 
ing the amplification reaction. This sug- 
gests that with the quencher dye on the 
second nucleotide from the 5' end, there 
is insufficient room for Taq polymerase 
to cleave efficiently between the reporter 
and quencher. The other five probes ex- 
hibited comparable ARQ values that are 



clearly different from zero. Thus, ail five 
probes are being cleaved during PCR am* 
plification resulting in a similar increase 
in reporter fluorescence. It should be 
noted that complete digestion of a probe 
produces a much larger increase in re- 
porter fluorescence than that observed 
in Figure 2 (data not shown). Thus, even 
in reactions where amplification occurs, 
the majority of probe molecules remain 
uncleaved. It is mainly for this reason 
that the fluorescence intensity of the 
quencher dye TAMRA changes little with 
amplification of the target. This is what 
allows us to use the 582-nm fluorescence 
reading as a normalization factor. 

The magnitude of RQ" depends 
mainly on the quenching efficiency in- 
herent in the specific structure of the 
probe and the purity of the oligonucle- 
otide. Thus, the larger RQ" values indi- 
cate that probes AM4, Al-19, Al-22, and 
A 1-26 probably have reduced quenching 
as compared with Al-7. Still, the degree 
of quenching is sufficient to detect a 
highly significant increase in reporter 
fluorescence when each of these probes 
is cleaved during PCR. 

To further investigate the ability of 
TAMRA on the 3' end to quench 6-FAM 
on the 5' end, three additional pairs of 
probes were tested in the 5' nuclease 
PCR assay. For each pair, one probe has 
TAMRA attached to an internal nucle- 
otide and the other has TAMRA attached 
to the 3' end nucleotide- The results are 
shown in Table 2. For all three sets, the 
probe with the 3' quencher exhibits a 
ARQ value that is considerably higher 
than for the probe with the internal 
quencher. The RQ" values suggest that 
differences in quenching are not as great 
as those observed with some of the Al 
probes. These results demonstrate that a 
quencher dye on the 3' end of an oligo- 
nucleotide can quench efficiently the 



TABLE 2 Results of 5' Nuclease Assay Comparing Probes with TAMRA Attached to an Internal or 3'-terminaI Nucleotide 



518 nm 582 nm 



Probe 


no temp. 


+ temp. 


no temp. 


+ temp. 


RQ" 


RQ + 


ARQ 


A3-6 


54.6 ± 3.2 


84.8 ± 3.7 


116.2 + 6.4 


115.6 ± 2.5 


0.47 ± 0.02 


0.73 ± 0.03 


0.26 ± 0.04 


A3-24 


72.1 ± 2.9 


236.5 ± 11.1 


84.2 ± 4,0 


90.2 ± 3.8 


0.86 ± 0.02 


2.62 ± 0.05 


1.76 ± 0.05 


P2-7 


82.8 ±4.4 


384.0 ±34.1 


105.1 ± 6.4 


120.4 ± 10.2 


0.79 ± 0.02 


3.19 + 0,16 


2.40 ±0.16 


P2-27 


113.4 ± 6.6 


555.4 ± 14.1 


140.7 ± 8.5 


118.7 ± 4.8 


0.81 ± 0.01 


4.68 ± 0.10 


3.88 ± 0.10 


P5-10 


77.5 ± 6.5 


244.4 ± 15.9 


86.7 ± 4.3 


95.8 ± 6.7 


0.B9 ± 0.05 


2.55 ± 0.06 


1.66 ± 0.08 


P5-28 


64.0 ± 5.2 


333.6 ±12.1 


100.6 ± 6.1 


94.7 ± 6.3 


0.63 ± 0.02 


3.53 ± 0.12 


2.89 ± 0.13 



Reactions containing the indicated probes and calculations were performed as described in Material and Methods and in the legend to Fig. 2. 
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fluorescence of a reporter dye on the 5' 
end. The degree of quenching is suffi- 
cient for this type of oligonucleotide to 
be used as a probe in the 5' nuclease PCR 
assay. 

To test the hypothesis that quenching 
by a 3' TAMRA depends on the flexibility 
of the oligonucleotide, fluorescence was 
measured for probes in the single- 
stranded and double-stranded states. Ta- 
ble 3 reports the fluorescence observed 
at 518 and 582 nm. The relative degree 
of quenching is assessed by calculating 
the RQ ratio. For probes with TAMRA 
6-10 nucleotides from the 5' end, there 
is little difference in the RQ values when 
comparing single-stranded with double- 
stranded oligonucleotides. The results 
for probes with TAMRA at the 3' end are 
much different. For these probes, hy- 
bridization to a complementary strand 
causes a dramatic increase in RQ. We 
propose that this loss of quenching is 
caused by the rigid structure of double- 
stranded DNA, which prevents the 5' 
and 3' ends from being in proximity. 

When TAMRA is placed toward the 3' 
end, there is a marked Mg 2 "" effect on 
quenching. Figure 3 shows a plot of ob- 
served RQ values for the Al series of 
probes as a function of Mg* + concentra- 
tion. With TAMRA attached near the 5 r 
end (probe Al -2 or Al-7), the RQ value at 
0 mM Mg 2 * is only slightly higher than 
RQ at 10 mM Mg 2 *. For probes AM9, 
Al-22, and Al-26, the RQ values at 0 mM 
Mg* + are very high, indicating a much 



reduced quenching efficiency. For each 
of these probes, there is a marked de- 
crease in RQ at 1 mM Mg 2 * 1 " followed by 
a gradual decline as the Mg z+ concen- 
tration increases to 10 mM. Probe Al-14 
shows an intermediate RQ value at 0 mM 
Mg 24 " with a gradual decline at higher 
Mg 2 * concentrations. In a low-salt en- 
vironment with no Mg 2+ present, a sin- 
gle-stranded oligonucleotide would be 
expected to adopt an extended confor- 
mation because of electrostatic repul- 
sion. The binding of Mg 2 * ions acts to 
shield the negative charge of the phos- 
phate backbone so that the oligonucle- 
otide can adopt conformations where 
the 3' end is close to the 5' end. There- 
fore, the observed Mg 2 * effects support 
the notion that quenching of a 5' re- 
porter dye by TAMRA at or near the 3' 
end depends on the flexibility of the oli- 
gonucleotide. 

DISCUSSION 

The striking finding of this study is that 
it seems the rhodamine dye TAMRA, 
placed at any position In an oligonucle- 
otide, can quench the fluorescent emis- 
sion of a fluorescein (6-FAM) placed at 
the 5' end. This implies that a single- 
stranded, double-labeled oligonucle- 
otide must be able to adopt conforma- 
tions where the TAMRA is close to the 5' 
end. It should be noted that the decay of 
6-FAM in the excited state requires a cer- 
tain amount of time. Therefore, what 



TABLE 3 Comparison of Fluorescence Emissions of Single-stranded and 
Double-stranded Fluorogenic Probes 



518 nm 



582 nm 



RQ 



Probe 



ss 



ds 



ds 



ss 



ds 



Al-7 

Al-26 

A3-6 

A3-24 

P2-7 

P2-27 

P5-10 

P5-28 



27.75 
43.31 
16.75 
30.05 
35.02 
39.89 
27.34 
33.65 



68.53 
509.38 

62.88 
578.64 

70.13 
320.47 
144.85 
462.29 



61.08 
53.50 
39.33 
67.72 
54.63 
65.10 
61.95 
72.39 



138.18 
93.86 
165.57 
140.25 
121.09 
61.13 
165.54 
104.61 



0.45 
0.81 
0.43 
0.45 
0.64 
0.61 
0.44 
0.46 



0.50 
5.43 
0.38 
3.21 
0.58 
5.25 
0.87 
4.43 



(ss) Sinale-stranded. The fluorescence emissions at 518 or 582 nm for solutions containing a final 
concentration of 50 hm indicated probe, 10 mMTris-HCl (pH 8.3), 50 mM KC1, and 10 idm MgO* 
(ds) Double-stranded. The solutions contained, in addition, 100 nM A1C for probe, Al-7 and 
Al-26, 100 nM A3C for probes A3-6 and A3-24, 100 nM P2C for probes P2-7 and P2-27, or 100 DM 
P5C for probes P5-10 and P5-28. Before the addition of MgCl* 120 ul of each sample was heated 
at 95<C for 5 min. Following the addition of 80 *U of 25 mM MgCl 2 , each sample was allowed to 
cool to room temperature and the fluorescence emissions were measured. Reported values are 
the average of three determinations. 



matters for quenching is not the average 
distance between 6-FAM and TAMRA 
but, rather, how close TAMRA can get to 
6-FAM during the lifetime of the 6-FAM 
excited state. As long as the decay time of 
the excited state is relatively long com- 
pared with the molecular motions of the 
oligonucleotide, quenching can occur. 
Thus, we propose that TAMRA at the 3' 
end, or any other position, can quench 
6-FAM at the 5' end because TAMRA is in 
proximity to 6-FAM often enough to be 
able to accept energy transfer from an 
excited 6-FAM. 

Details of the fluorescence measure- 
ments remain puzzling. For example, Ta- 
ble 3 shows that hybridization of probes 
Al-26, A3-24, and P5-28 to their comple- 
mentary strands not only causes a large 
increase in 6-FAM fluorescence at 518 
nm but also causes a modest increase in 
TAMRA fluorescence at 582 nm, If 
TAMRA is being excited by energy trans- 
fer from quenched 6-FAM, then loss of 
quenching attributable to hybridization 
should cause a decrease in the fluores- 
cence emission of TAMRA. The fact that 
the fluorescence emission of TAMRA in- 
creases indicates that the situation is 
more complex. For example, we have an- 
ecdotal evidence that the bases of the 
oligonucleotide, especially G, quench 
the fluorescence of both 6-FAM and 
TAMRA to some degree. When double- 
stranded, base-pairing may reduce the 
ability of the bases to quench. The pri- 
mary factor causing the quenching of 
6-FAM in an intact probe is the TAMRA 
dye. Evidence for the importance of 
TAMRA is that 6-FAM fluorescence 
remains relatively unchanged when 
probes labeled only with 6-FAM are used 
in the 5' nuclease PCR assay (data not 
shown). Secondary effectors of fluores- 
cence, both before and after cleavage of 
the probe, need to be explored further. 

Regardless of the physical mecha- 
nism, the relative independence of posi- 
tion and quenching greatly simplifies 
the design of probes for the 5 f nuclease 
PCR assay. There are three main factors 
that determine the performance of a 
double-labeled fluorescent probe in the 
5' nuclease PCR assay. The first factor is 
the degree of quenching observed in the 
intact probe. This is characterized by the 
value of RQ", which is the ratio of re- 
porter to quencher fluorescent emis- 
sions for a no template control PCR. In- 
fluences on the value of RQ~ include 
the particular reporter and quencher 
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mM Mg 

FIGURE 3 Effect of Mg 2+ concentration on RQ ratio for the Al series of probes. The fluorescence 
emission intensity at 518 and 582 nm was measured for solutions containing 50 nM probe, 10 ttim 
Trls-HCI <pH 8.3), 50 mM KG, and varying amounts (O-10 mM) of Mgd 2 . The calculated RQ 
ratios (518 nm intensity divided by 582 nm intensity) are plotted vs. MgCI 2 concentration (mM 
Mg). The key (upper right) shows the probes examined. 



dyes used, spacing between reporter and 
quencher dyes, nucleotide sequence 
context effects, presence of structure or 
other factors that reduce flexibility of 
the oligonucleotide, and purity of the 
probe. The second factor is the efficiency 
of hybridization, which depends on 
probe T mt presence of secondary struc- 
ture in probe or template, annealing 
temperature, and other reaction condi- 
tions. The third factor is the efficiency at 
which Taq DNA polymerase cleaves the 
bound probe between the reporter and 
quencher dyes. This cleavage is depen- 
dent on sequence complementarity be- 
tween probe and template as shown by 
the observation that mismatches in the 
segment between reporter arid quencher 
dyes drastically reduce the cleavage of 
probe, (l) 

The rise in RQ" values for the Al se- 
ries of probes seems to indicate that the 
degree of quenching is reduced some- 
what as the quencher is placed toward 
the 3' end. The lowest apparent quench- 
ing is observed for probe Al-19 (see Fig. 
3) rather than for the probe where the 
TAMRA is at the 3' end (Al-26). This is 
understandable, as the conformation of 
the 3' end position would be expected to 
be less restricted than the conformation 
of an internal position. In effect, a 
quencher at the 3 f end is freer to adopt 
conformations close to the 5' reporter 
dye than is an internally placed 
quencher. For the other three sets of 



probes, the interpretation of RQ" values 
is less clear-cut. The A3 probes show the 
same trend as Al, with the 3' TAMRA 
probe having a larger RQ" than the in- 
ternal TAMRA probe. For the P2 pair, 
both probes have about the same RQ" 
value. For the P5 probes, the RQ" for the 
3' probe is less than for the internally 
labeled probe. Another factor that may 
explain some of the observed variation is 
that purity affects the RQ" value. Al- 
though all probes are HPLC purified, a 
small amount of contamination with 
unquenched reporter can have a large ef- 
fect on RQ". 

Although there may be a modest ef- 
fect on degree of quenching, the posi- 
tion of the quencher apparently can 
have a large effect on the efficiency of 
probe cleavage. The most drastic effect is 
observed with probe Al-2, where place- 
ment of the TAMRA on the second nu- 
cleotide reduces the efficiency of cleav- 
age to almost zero. For the A3, P2, and PS 
probes, ARQ is much greater for the 3' 
TAMRA probes as compared with the in- 
ternal TAMRA probes. This is explained 
most easily by assuming that probes 
with TAMRA at the 3' end are more likely 
to be cleaved between reporter and 
quencher than are probes with TAMRA 
attached internally. For the Al probes, 
the cleavage efficiency of probe Al-7 
must already be quite high, as ARQ does 
not increase when the quencher is 
placed closer to the 3' end. This illus- 



trates the importance of being able to 
use probes with a quencher on the 3' 
end in the 5' nuclease PCR assay. In this 
assay, an increase in the intensity of re- 
porter fluorescence is observed only 
when the probe is cleaved between the 
reporter and quencher dyes. By placing 
the reporter and quencher dyes on the 
opposite ends of an oligonucleotide 
probe, any cleavage that occurs will be 
detected. When the quencher is attached 
to an internal nucleotide, sometimes the 
probe works well (Al-7) and other times 
not so well (A3-6). The relatively poor 
performance of probe A3-6 presumably 
means the probe is being cleaved 3' to 
the quencher rather than between the 
reporter and quencher. Therefore, the 
best chance of having a probe that reli- 
ably detects accumulation of PCR prod- 
uct in the S' nuclease PCR assay is to use 
a probe with the reporter and quencher 
dyes on opposite ends. 

Placing the quencher dye on the 3' 
end may also provide a slight benefit in 
terms of hybridization efficiency. The 
presence of a quencher attached to an 
internal nucleotide might be expected to 
disrupt base-pairing and reduce the T m 
of a probe. In fact, a 2°C~3°C reduction 
in T m has been observed for two probes 
with internally attached TAMRAs/ 9) This 
disruptive effect would be minimized by 
placing the quencher at the 3' end. Thus, 
probes with 3' quenchers might exhibit 
slightly higher hybridization efficiencies 
than probes with internal quenchers. 

The combination of increased cleav- 
age and hybridization efficiencies means 
that probes with 3' quenchers probably 
will be more tolerant of mismatches be- 
tween probe and target as compared 
with internally labeled probes. This, tol- 
erance of mismatches can be advanta- 
geous, as when trying to use a single 
probe to detect PCR-amplified products 
from samples of different species. Abo, it 
means that cleavage of probe during PCR 
is less sensitive to alterations in an- 
nealing temperature or other reaction 
conditions. The one application where 
tolerance of mismatches may be a disad- 
vantage is for allelic discrimination. Lee 
etal. 0> demonstrated that allele-spedfic 
probes were cleaved between reporter 
and quencher only when hybridized to a 
perfectly complementary target. This al- 
lowed them to distinguish the normal 
human cystic fibrosis allele from the 
AF508 mutant. Their probes had TAMRA 
attached to the seventh nucleotide from 

PCR Methods and Applications $61 
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the 5' end and were designed so that any 
mismatches were between the reporter 
and quencher. Increasing the distance 
between reporter and quencher would 
lessen the disruptive effect of mis- 
matches and allow cleavage of the probe 
on the incorrect target. Thus, probes 
with a quencher attached to an internal 
nucleotide may still be useful for allelic 
discrimination. 

In this study loss of quenching upon 
hybridization was used to show that 
quenching by a 3' TAMRA is dependent 
on the flexibility of a single-stranded oli- 
gonucleotide. The increase in reporter 
fluorescence intensity, though, could 
also be used to determine whether hy- 
bridization has occurred or not. Thus, 
oligonucleotides with reporter and 
quencher dyes attached at opposite ends 
should also be useful as hybridization 
probes. The ability to detect hybridiza- 
tion in real time means that these probes 
could be used to measure hybridization 
kinetics. Also, this type of probe could be 
used to develop homogeneous hybrid- 
ization assays for diagnostics or other ap- 
plications. Bagwell et al. <10) describe Just 
this type of homogeneous assay where 
hybridization of a probe causes an in- 
crease in fluorescence caused by a loss of 
quenching. However, they utilized a 
complex probe design that requires add- 
ing nucleotides to both ends of the 
probe sequence to form two imperfect 
hairpins. The results presented here 
demonstrate that the simple addition of 
a reporter dye to one end of an oligonu- 
cleotide and a quencher dye to the other 
end generates a fluorogenic probe that 
can detect hybridization or PCR amplifi- 
cation. 
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Quantitative nucleic acid sequence analysis has 
hod an important ™te in many fields of biologi- 
cal research. Mcasiuemeni of gen* expression 
(RNA) has b««ti used extensively In nnmltoiing 
biological responses lo various stimuli (Tan el at. 
1991; Huang el al. I995a,b; Prud'bomroe el al. 
1995), Quaniiiatlvc gen*? analysis (DNA) has 
Ix-en used to determine the genome quantity of a 
particular gene, as in the case ot t he human H1LR2 
gene, which Is amplified in -30% of breast tu- 
mors (Slamon al. 1987). Gene and genome 
quantitation (DNA and RNA) also have been used 
for analysis of human immunndcllcicncy virus 
(ilJV) buTdcn demonstrating changes in the lev- 
els of virus throughout the different phases of the 
- disease (Connor et al. 1993; J'lHtak ct ah J9v:$to; 
)-uTtado ei al. 1995). 

Many methods have been described for The 
quantitative analysis of nucleic acid sequences 
(both for RNA and DNA; South*™ 19/5; Sharp el 
al. 19K0; Thomas 19«0)_ Recently, PCR has 
proven to be a powerful tool fOT quantitative 
nucleic acid analysis. PCR and reverse transcrip- 
tase (KT)-PCR have permitted the analysis of 
- minimal starting quantities of nucleic acid (as 
little as one cell equivalent). This has made pos- 
sible many experiments that could not hove been 
performed with traditional methods. Although 
PCR has provided a powerful tool, it is imperative 
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thai tl be used i>ropcrly for quantitation (W»«y 
maskers 1995). Many early rejx>rls of quantita- 
tive. I'CR and RT-I'CR described quantitation of 
the rcit product but did not measure, the Initial 
target sequence quantity. It is essential to design 
proper controls for the quantitation of ihc initial 
target sequences (Fc'rrc 1992; Uun»«iUl ct al. 
100?) 

KvN*!<«fchcrs have developed several methods 
of quantitative l'CR and RT-PCR. One approach 
measures 1»CR product quantity In the log phase 
of the reunion before the plateau (Kellogg el al. 
1990; I'ang ct al. ^WO). This method requires 
thai each sample has equal input amounts of 
. nucleic add and that each sample under analysis 
amplifies with identical efficiency up to the point 
of quantitative analysis. A gene sequence (con- 
tained in all samples at relatively constant quan- 
lil'.**, such as p-aclln) on bo us«d for sample, 
amplification efficiency normalization. Using 
conventional methods of KiR detection and 
quantitation (gel electrophoresis or plate capture 
hybridization), it is extremely laborious to assure 
that all samples are analyzed during the log phase 
of the reaction (for both the target gene and Ihc 
normalization gene). AnoUier method, quantita- 
tive competitive (QQ-RCR, has been developed 
and is used widely for PCR quantitation. QC-l'CR 
relics on the inclusion of an internal control 
competitor in each reaction (Becker-Andre 1991; 
Plata* el al. \00^,h). Th« efficiency of each re- 
action is normalized to (he Internal compehtor. 
a kunwn ami.nui of Internal competitor can be 
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added to each sample. To obtain relative r\\\Sit\U 
ration, the unknown target PGR product is com- 
pared with the known competitor PCK product. 
Success of a quantitative competitive PCR assay 
relics on developing an internal control ihtil am- 
I >l i nra with the same efficiency as the uuget jooI- 
eculc. TJlC design of The compel! loi and the vali- 
dation of amplification efficiencies acquire a 
dedicated effort. However, becnusc QC>k:h dws 
not require that PCM piiKlucls be analysed during 
the log phase of (he. amplification, it is the easier 
vf the. two methods to use. 

Several detection system* uje uwxl for quan 
Utativc PCX and RT-I»Ctt analysis; (1) uprose 
gels, (2) fluorescent labeling of PCU products and 
detection with I«*iTT-indiicr.d fluorescence using 
capillary elccrrophorcai:* (ftisco et al. 1995; WIU 
Jjams ei al. 1 99o) or acrylaiuUlc gels, ami (3) phue 
capture 1 , and sandwich probe hybridism Sou (Mul- 
der el al. 1994). Although these method?* pmvrd 
successful, each method requires post-PCR nw- 
ntpulatlons Thar add Time to the analysis and 
may lead to hibuni (u«y i on In ml nation. The 
sample Uiroughpul of these met hud* \> limited 
Ihc t-xccpllon of the plate capture ap- 
proach), am!, thi:n:fiire tf these methods, ore not 
well >uited foi um;* demanding high sample 
Throughput (I.e., screening of large numbers of 
hliixuulc^ulc^ in *nudyr.lng samples fwi dia&nua* 
lie* or clinical IriaUs). 

Iktrc we report the development of a novel 
ii*5tty for quantitative DNA analysis. The assay is 
Iwsed on tli*: use -of the 5' 'nuclease as*ay first 
described by Holluud et al. (1993). The method 
uses the 5' nuclease activity of 7Yi</ polymerase to 
i:lc.avc a nonc.Mcndlbk: hybridization probe dur- 
ing Tnr e* tension phase* of I'OU- The tippronch 
uses dunl-labclcd fluorogenic hybridi/.utJon 
probes (Lee ct nl. 1 *>i>3; dossier ct ul. 1993; hlvok 
et ttl. t$9fio,b). One fluorescent dye >erve.3 u 
reporter |FAM (i.e., G-ciirbaxyfluorescein)| and its 
emission spectra is quenched by the second fluo- 
rescent dye, TAMRA (i.e., o-carboxy-t etr ft methyl- 
rhodaminc). The nuclease degradation of the hy- 
hrldl/iitlon probe releases the quenching of Die 
l'AM fluorescent emission, resulting in an In- 
crease in peak fluorescent emission at SJg nm. 
The use of <i sequence detector (A13J Prism) allows 
measurement of fluureNirent spectra ol "all 96 wells 
of The thermal cycler continuously during the 
IK'Al amplification. Therefore, the relictions toe 
monitored in real Lime. The output data is de- 
scribed and quimtiunlve unulyab of input Uigcl 
DNA sequences is discussed below. 
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RESULTS 

PCR Product Derecrlon in Real Time 

The gold was to develop a high-throughput, son- 
sitive, and neeuratc gene quantitation asaay for 
use I" monitoring lipid mediated therapeutic 
gene delivery. A plaswld unending human factor 
VIII gene sequence, pI8TM (sec. Methods). w;»s 
used as a model therapeutic gene. The assay usr< 
fluorescent Taqmaii methodology and an instru- 
ment capable of measuring fluorescence in real 
time (AB1 Prism 7700 Sequence TVtcclnr). Ihe 
Tatjma" reaction requires » hybridization pmhe 
lalxlcd witli two different fluorescent dyes. One 
dye is a reporter dyw (l-'Arvi), the other ix quench- 
ing dye (TAMRA). When the proU: 1a inlacl, fluo- 
i est en I energy transfer occurs and the reporter 
dye fluorescent emission ia ubsorbed by the 
quenching dye (TAMJVA) . During Uic extension 
phase of the PCR cycle, the fluorescein hybrid- 
iKAllon pfohc is cleaved by tlic S'-3* nucleolytic 
octlvity of the. DNA polymerase. On cleavage of 
the probe, the reporter dye emission Is no longer 
transferred efficiently to the quenching dye, re 
suit ii ik hi sm increase of the report or dye fluores- 
cent endxilon spectra. PCR primers und probe* 
vrerv designed foi tlits human fnclor VI 1 J se- 
quence and human p-actln gene (a.t de^crihfcd in 
Methods). Optimization reactions were per- 
formed to choose the appropriutc probe and 
magnesium concentrations yielding the highest 
intensity of re|>oricr fluorescent sign«l without 
sacrificing specificity. The Instrumenl uses a 
clmrgu-coupicd device (I.e., CCD etuncrn) for 
measvuing the fluorescent emission apcetn* from 
. r jOO m rt50 nm. ICach rc;it tube w«s monitored 
sapient in Uy tor 2ft mwc with continuous moni- 
toring throughout th« ttinplifictititiii. Uftch litbc 
wo« rr.-cxandncd every B.S see. Computer x>f(- 
wart*. was designed to extnnijic tlie fluorescent In* 
tensity c*f both the reporter dy<^ (KAM).and 
the quenching dye (TAMJIA). *J^»e lluoresccnt 
intensity of the quenching dye # 'I'AMUA, changes 
very lit lie over the course of the PCR aroplifh 
cation (datn not shown). Therefore., the Intensity 
of TAM11A dye emission serves ws tin internal 
.tlandurd with which lo normullye the reporter 
dye. (HAM) emission variations. Tl^e .software eul- 
culoles * vtjlue termed ARn (or ARQ) using the. 
following equation: ARn - (lln J ) (Hn'), wl^ere 
Kn 4 • emlSMiun intensity of reporter/emission in- 
tensity of quencher al any given lime In a reue 
rlon tube, and kn - emission intensitity of re- 
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poncr/v.mtS5ton Ihiemily i>f quencher measured 
prior 10 JO iin.plilicalion in ih;ir same reaction 
tube. J'or the purpose of quantitation, the \m 
three data points (A Km) collected during The ex* 
tension step for each 1 J CK cycle wv.tc analyzed. 
The nucleolyUc degradation of the. hyurich^tion. 
probe occurs during the extension phase or I'Ui, 
and, therefore, reporter fluorescent t:nii»r»jun in- 
creases' Ouring this time. ')iu: thiev, data points 
were averaged for each KJK cycle and ihe menu 
value for each was plotted in an "amplification 
plot" shown In ]'i«urc 3 A. Tlic AUn mean valui- is 
pUAted on the ^axls, and time, represented by 
cycle number, is plotted on tlu**-axi&. During the 
early cydew of the VCR amplification, the AKn 



value remains at base Jlno When sufficient liy- 
bridiz-atlpn probe has bocn cleaved by lhi> Tin? 
jx^ymcrasc iniclc.Afie Activity, the inleiishy of re- 
porU-r fhmrcjiceni emission invreavvb. Most l>CU 
amplinv4itj«n» reach » v^al^au phono of reporter 
fJuoxeHV.nl emifislon if the rcHcliun Is carried not 
to high cycle jjuhiIhtin. The amplification plot \*J 
examined vuily in lb* reaction, ut a point that 
tcprcsonts ibe log phase of produd arrumula* 
lion. This is done by assigning »" aibiUajy 
threshold thai i.s b»*cd on the varuibirdy of the 
basts-line dwia. In Figure 1 A, the threshold whs set 
iii 10 standard deviations above, the mean r>f 
base line enitoum tialculated from i.yi;k» 1 Lo 1 5. 
Once the threshold is chosen, the point at which 
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the amplification plot crosses the thrctihold'ivcle 
fined as C r . C, is reported ua the cycle number u\ 
this point. As will be demons! rutud, Ihu CI, .value 
Is pi edict! ve of ihc quantity of input turget. 

Cf Values Provide a Quantitative Mea:uircmcnr. of 
Input Targer Sequences 

Figure in shows amplification pJoi* of ]i»«clilT»5K. 
ent PGR amplifications overlaid. The amplify 
rions were performed on a 1:2 serial dilution -mr 
human genomic PNA. M"hc amplified target w:w. 
human P nctJn. The amplification plotK xhifl to 
the right (to higher threshold cycles) n* the input 
target quantity is. reduced. 'JTuf. is expoctod ho- 
carnal Functions; with fewer starting coping of t)lO 
target molecule require greater amplification 10 
degrade enough probe to attain the Threshold 
fluorescence. An arbitrary threshold of 10'stom- 
dard deviations above the base line was used to 
determine the O r valuer. Figure 1C represents the 
C T values plotted versus the sample dilution 
value, Each dilution was amplified "try triplicate 
Pc:R Amplifications and plotted as mean values 
with error bars representing one standard devia- 
tion. The C T values decrease linearly with Increas- 
ing target quantity, Thus, C r valuta can be used 
as a quantitative measurement of t lie input target 
number, it should be noted that the amplifica- 
tion plot for the 15.6'iig sample shown In l ; lgurc 
IB does not ro.tte.ct the same fluorescent rate of 
Increase exhibited by most of the other samples. 
The 15.6-ng sample also achieve** endpolnt pla- 
teau at a lower fluorescent value than would he 
expected ha.sed on the input DNA. This phenom- 
enon has been observed occasionally with other 
samples (data not shown) and may be attribut- 
able to lata cycle inhibition; this hypothesis is 
still under investigation. It is important to note 
that the flattened slope and earty pJatcau do not 
impact significantly the calculated O, value us 
demonstrated by the fit on ihe line shown »n 
Figure. 1 C All triplicate amplification* resulted in 
very similar d- values— the standard deviation 
did not exceed 0.5 for any dilution. This experi- 
ment contains- a > 1 00,000-fold range of input tar- 
get molecules. Using C v values for quantitation 
permits a much larger assay range than directly 
using total fluorescent emission intensity for 
quantitation. The linear range. ol lluoresccnl in- 
tensity measurement of i he Alii I'rlsm 7700 $e- 

anurM 
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miMits over n very large r;mj»i» of r?ln1ivf> ci^rHnp,' 
largH quantities. 

Sample Preparation Validation 

Several parameters influence the cfllclenry ( >f 
PCM amplification: magnesium and sail conceiv 
tuitions, reaction conditions (i.e.,, time »nd tem- 
perature), PCH target size and composition, 
primer sequences, and sample purity. All of The 
.above (actors are common to a single VCR assay, 
except sample to sample purity. In an effort to 
validate the. method of sample preparation for 
the iaeior VJil assay, PCK amplification reproduce 
fbility 3nd oJf'icJcncy oi 10 replicate sample 
prestations were examined. After genomic DNA 
was prepared from the 10 replicate samples, the 
DNA wasquamUatcd by ultraviolet spectroscopy. 
Amplifications were performed analyzing p-aciln 
gem: content ill 100 and 25 ng of total genomic 
DNA. Each KiK amplification was performed in 
triplicate. Comparison of C r values for each trip- 
licate sample show minimal variation hased on 
standard deviation and coefficient of variance 
(Tabic 1). Therefore, each ol the triplicate PGK 
amplifications was highly reproducible, demon- 
strating that real time FCK using this instrumen- 
tation introduces minimal variation Into the 
quantitative PCK analysis, Comparison of the 
mean d values of the. 10 replicate sample prepa- 
rations also showed minimal variability, indicat- 
ing that each sample preparation yielded similar 
results for p-aclin gene quantity. The highest C; T 
difference between any of the samples was 
and 0.73 for the 100 and 25 ng samples, respec- 
tively. Additiona/ly, the amplification of each 
sample exhibited an equivalent rate of fluoro 
cent emission intensity change per amount of 
DNA target analyzed ns indicated by similar 
slopes derived from ihe sample df Unions (Pig. 2). 
Any sample containing an excess of a VCM inhibi- 
tor would exhibit a greater measured (3-actJn O r 
vaiuc for a given quantity of DNA. In addition, 
ihc inhibitor would be diluted along with lint 
sample in the dilution analysis (ii^ f Z) t altering 
the expected C t > value change. F.ach .sample am- 
plification yielded a similar result in the analysis, 
demonstrating that this method of sample prepa- 
ration is highly reproducible with regard to 
sample purity. 

Quantitative Analvsis of a Plasmid After 

7nc« no/ «f»ft w J «c:?7T 7nn7/cn/7T 
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Table 1. Reproducibility of Sample Preparation Method 



Sample 



no. 



9 
10 

Mean 



100 ng 



standard 
m^an deviation CV C T 



18.24 

18.23 

18.33 

18.33 

18.35 

18,44 

18.3 

18.3 

18,42 

18.15 

18.23 

18.32 

18.4 

18.38 

18.46 

18,54 

18.67 

19 

18.28 

18.36 

18-52 

18.45 

18.7 

18.73 

18.18 

18.34 

18.36 

18.42 

18.57 

1 8.66 

0 10) 



13.27 0.06 



0.06 



16.34 0.07 



18.23 0,08 



1B.-12 0.04 



18.74 0.24 



18.29 



18.55 
18.12 



0.12 



18.63 0.16 



18.29 0.1 



0.12 
0.17 



0,32 

03? 

0.36 

0.46 

0.23 

1.26 

0.66 

0,83 

0.55 

0.65 
0,90 



20.48 

20.55 

20,5 

20.61 

20.59 

70.41 

20.54 

20.6 

20.49 

20.48 

20,44 

20.38 

20.68 

20.87 

20,63 

21.09 

21.04 

21 .04 

20.67 

20,73 

20.65 

20,98 

20.84 

20.75 

20.46 

20.54 

20.48 

20.79 

20.78 

20.62 



25 ng 



standard 
mean deviation CV 



20,51 0.03 0,17 

70.54 0.11 0.54 

20.54 0.06 0.28 

20.43 0.05 0.26 

20.73 0.1 3 0.61 

21.06 0.03 0.15 

20.6& 0.04 0.2 

20.86 0.12 0.57 

20.51 0.07 0.32 

20.73 0.1 0.-16 

20.66 0.19 0.P4 



tor containing a partial cDNA for human factor 
VW, pl ; 8TM- A scries of tr» infections was sci 
up using a decreasing amount of the plasii*rid\40, 
A t (>.5 # and 0.1 p,g). Twr.my-rour hours post- 
tran.ifcciinn, total H>NA wa$ purified from each 
flask uf irlb. p-Aclin gene Ljuantlty wn* cho>en a* 
it value for normally! iwn or Ke.iu.miii*. DNA con- 
ccnrnnJuii from em/1 1 sample. In this cApeurncnt, 
(3-actin gene content should' remain constant 
relative to coral genomic DNA. Figure A shows the 
result of the p-actin DNA measurement (100 ng 
total DNA determined by ultraviolet spectros- 
copy) oi each sample. Kach sample was analyzed 
lit triplicate and the mean p-actm C r values of 
the triplicates were plotted (error bars represent 

r**jvt«iofri fim/i^imni '\hf htPhf»sr «iiffrrrnrp 



between any two sample* moans wax 0.*»5 C,-, 'Jen 
nanograms of total DNA of each sample were also 
examined for 0-aetln. The results UflUJU >hi>wed 
that very similar amounts of genomic DNA wore 
present; the. maximum mean p sctin C;, value 
difference w*.s 1.0. As J : igurc 3 shows, the rate of 
P-actJn C,. uIkiuku Ixriwecn the 100 and 10-ng 
sample* was similar (slope values rangw Hutwocrn 
3,56 and - 3.45). Thi* verifies again that ih« 
method of .sample preparation yields sajnplos of 
identical PCR integrity (i.e~, no sample contained 
an excessive amount of a VCR Inhibitor). How- 
ever, these results; Indicate that cuch sample con- 
tained slight diffeiences in the actual amount of 
genomic DNA analyzed. Determination of actual 
wenuniie DNA ^onccnlraiion was accomplished 
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Sample 





1.3 1.4 



t,e i.o t.7 i-n 1.0 * 2 
log (ng Input genomic DMA) 



M 



Figure 2 Somple preparation purity. 1 he rep ties to 
samples shown In Table "J woro also amplified in 
tripicate using 25 ng of each DNA sample, Th« fig- 
uic shows die input DNA concentration (100 and 
25 ng) vs. C, In ihf* lipnrp, trip 100 nnri ?S ng 
points for each sample are connected by a line. 



by plotting the mean £-actm O, value obtained 
fOT «at:li 100-ng saniplv on a p-aciln standard 
i.-uive (shown In Pig. 4< ^>- ochwl genomic 
ON A concent rat lmi of each su-mplt:, e, vvns ob 
talncd by extrapolation 1o the Xaxi*. 

Figuru 4 A shows the measured nurt- 
normalised) qvnuililie:. «f factor VJ] J plnnmid 
ONA (pJtSTM) from each' of the four transient cell 
Inunctions. Each reaction contained 100 of 
tola! sample: DNA (as determined by UV spectrum 
copy}- VjxCft sample was analyzed in triplicate 



25 
j ,3 

21 



20 



y -Z8.?r « •a.Bex w»i 
jr Mid - *s>»» H» i 



pF-6 TM trorrJootod 

-» * AS) ftp 
4 0.1 HQ 



o.a 



r— 
1£ 



s 

i' 



rj7 



1.4 1-3 1.8 
l?9 (ng input ONA) 

Figure 3 Analysis of Utinsfectcd ccJl DNA quontity 
and purity. I lie DNA preparations of the four 293 
cell transections (40, A, 0.5, and 0,1 j*g of pF8TM) 
were analyzed for the P-actln gene. 1 00 and 1 0 ng 
(determined by ultraviolet spectroscopy) of each 
sample were amplified in triplicate. For each 
amount of pF8TM that was transfectcd, the (3-aciln 
C T values are plotted versus the total Input DNA 



h 7 n (vh 



PC U amplifications. As shown, pI*'8TM purified 
jfioic Jhc 29A cells decreases (mean C, values in- 
cur.t^) with decreasing amounts of plasm Id 
,truml» a Ucd. Tin* mean C A values obtained for 
prUTM in TigufC 4A were plotted on ;i standard 
curve cc«mpri.sed uf sei hilly diluted pKHTM, 
shown .in figure 4R. The quantity uJ ph'XTM, h, 
found in each of the four t ran Kfoct ions was do- 
tcrmined by extrapolation to tho x ax It; of tho 
standard curve In Figure 4U. Thctsc uncorrected 
values, b, for pHTJ'M were nor mailed to deter- 
mine the actual amount of pF81M found per 100 
r»K of genomic ONA by using ihci equation:. 

/> x 10 0 ng ucuaal pl-B'nvl copies per 
r 100 ng of genomic DNA 

where a - actual- genomic DNA in a .sample and 
b w- pPHTM copies from the standard curve. 'Hie 
normalised quantity of pl'6TM per 100 ng of ge- 
nomic ONA for c;ich of The four Iramfcctlons is 
snown in Figure 4 J J. 'Hi cm: roull.% Micrw mat the 
qunntlty of. factor Vlll plasiutd itssocJuled wiin 
tho 293 cellN, 21 hr after irujusfvci'mn, di:t.iiMSe;. 
with decreasing pJii^inttl uiiji.wiuiation u.std In 
the uansfcution. 'Hu: quantity of pl'ttTM nwoeJ- 
atea with 293 cells, after trunsfectloji with 40 u,g 
t>f plfjKjnid» was 35 pgper 100 ng jjcnomle DNA. 
Tills results in -520 plasuiid copies per cell. 



DISCUSSION 

We have described a new method for qu fin Unit- 
ing gene copy numbers using fcaMlnlc nnalysts 
of PCK ampllficatlnnx. Real-time FCK in unnpai- 
iblc with cJthtiT of the two PGR (KT-PCR) ap- 
proauhe»: (1) quantitative a>»»f)«iitivL- where An 
inleiJiul eonipcliiCjf for each target sequence i» 
tiscd for noimali^flUon (data not shown) or (2) 
quantitative comparative PCH using a n win utilisa- 
tion ^ene contained within the sample (i.e*, p-nc- 
tin) or a "housekeeping" gene for RT-PCH. If 
equal amounts of nucleic ucid arc aiialy/.ed for 
eacti sample and if the amplification effkir.m\y 
before quantitative analysis is identical for each 
sample, the tTirerjial cujjtiwl (iiwjmrtli^nioii gene 
i>r competictjr) should give equal signals for al) 
samples. 

The real-time PCU method offers several ad- 
vantages over the other two methods currently 
employed (see the Introduction), l-irst, the real- 
time PCR mtfthod is performed in a do.scd-tube 
system and requires no post-PCR manipulation 

t r» /■» <? r*rti o*-a w.t nn*r«T /7T 



From : BML PHONE No. ; 310 472 0905 Dec. 05 2002 12:24ftrl P17 



HUD LI AL 



A & 




Figure 4 Quantitative analytic of pFSTM in transfected tclb. Amount of 
plasmid DNA used for I he trunsfeciion plotted against the iik-um C, vuFue deter- 
mjn«d for pfSTM remaining ^ br alter transection. (fl y Q Standard curvrj of 
pPRTM and 0-acu*n, respectively. pfQTM DNA <0) and genomic. DNA (Q were 
dilutftd *Arlally 1;5 before .amplification with the appropriate primer*. The p-actin 
standard curve wav usod to normalise the results of Aio TOO ng of genomic DNA. 
(0) The amount of pF8TM present per 100 ng of genomic DNA. 



of sample. Therefore, t)w p< >U l »Mi«J for fCR con- 
tamination in the laboratory js red viced because 
amplified product x can ln» analysed and disposed 
of without opening the reaction tubes. Second, 
this method suppojU the use of a norma ligation 
gene (i.e., p-actin) for quantitative. PCR or house- 
keeping genes for quantitative RT-1'CU controls. 
Analysis Js performed *7> real time during the Jog 
phase of product accumulation. Analysis during 
)uk phase permits many different genes (over a 
wide input target range) to be analy^rd simulta- 
neously, without concern of reaching reaction 
plateau at different cycles, This will make niulll- 
gene at"»al y sis assays much casie.i \\t develop, be- 
cause individual internal i.mnpetUois will not be 
needed Tor each gene under analysis. Third, 
sample throughput will imiease dram&lUaUy 
with the new method because t It eve is no post- 
rCM processing time. Additionally, woiking In a 
9 6- well format h highly cum pat i hie with auto- 
mation technology. 

The real-time 1*CR method is highly repro- 
ducible. Replicate amplifications can be analyzed 



for ?»ch sample minimising potential error. The. 
sysutiri allows' tor a very large assay dynamic 
range (approaching 1,000,000-fold starting Uu- 
gel). Uaing a .standard curve for the target oi in* 
tercMi relative copy number values can be deter- 
mined for any unknown rumple, fluorescent 
threshold values, C p courJuir. linearly with rela- 
tive DNA copy numbers. Real time quantitative 
UT-I'CK methodology (G'ibsun et al., this l.wut*) 
has also been devr.l oped. Finally, real time quan- 
titative 1'CU methodology can be used lw develop 
high-\hroughput screening assays for n variety of 
applications [quantitative gene Cftj'jeasiuu (KT- 
TCft), gene copy nanaya (Mcr£, III V, etc.), gCJlO- 
typlng (knockout mouse, analysis), and Immuno- 

pcnj. 

Real-time PCU may al.%o be performed using 
intercalating dyes (Higuehi et al. such as 

eiJiidiurrt bromide. The fluorogenic prohe. 
method offers a mafor advantage over inter- 
calating dyes-- greater specificity (i.e., primer 
dimvrs and not) sped He. PCR products are. not de- 
tected). 
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METHODS 

Generation of 4 Plasmld Containing a Parrlol 
cDNA for Human Factor YI11 • 

Total KNA hurvcMcd (KNA*ol IM«»m To J Test, Inc., 
J-fJC'iClswood, TX) ii-.iwrccieJ a foetur VUl 

rvj>n»sluu v,?<tar, pC*lS>2»Ue?.<M) (Kottm el »b WHO; Gor* 
mnn c.t al. 1900). A faelor VIII partial cl>NA srvjiHinv W(1S 
r t viwT*icd by HT Pf:ll "|<:i.iicAmp KZ (Tlh ItNA Kit 

(pan NHOK-cnvy, pe Appiut) hiosy&icm*, V'omvi c:tty, c;a)J 

using I he Vt:n primer* JVfor *«>d l-'Rrcv (primi't xrc|iieiirrt 
are shown below), Tin- ampHcon w.is reampliuYd iiAlng 
modified I'flfor and i^rcv printers ^npended wUh HumIU 
and /tf#fdlll restriction site sequences hi the V epdj and 
clonal into jXil'-M- 3Z (Promt^u tlorp., MudiaGii, Wl). The 
resulting cl«nr f pVSTM. was uwd lor transient transfcnlon 
oJ %*) 3 ccJk. 

Amplification of Target DNA ant! Detection or 
Amplicon Facior VIII Plasmic! DNA 

(p>'8TM) was tnnpllflttd with Our ptiim-ia l**Hftjr S'-CCC- 
CnT(KX^\AUAU:] , UAlMlCiTC.3' and Wrev .V-AAACC7I- 
luC#CXn"GCiA'JXi«i , rAt , ifi-3 , .'I1ic rvnciliui pi od nerd « 47.2- 
up i*c:K product. The forward primer wua deslxnvd lu icv 
ogntxe a unique M'tpiriur found In the 5* untranslated 
region of I hi: patent pC232.tk25J> plaimml mid ihurcforc 
clws out iviuHub^ «»'d amplify ibe human factor VUl 
gems 1'rimnrn woro chosen with tlio swifUniif* of I ho 0)111. 
pulcr premium Ol is° (Nutimiul HUi,seicm*cs, lne„ I My. 
mouth, MN). The human p-acttn ^nc w»is amplified with 
the priiuco (J-netin forwnrd j>rlmcr S'TCACCCACA< n'<!T 
GCCCATWACXiA'V and (s-actin ivwrse pernor V-( lAfJ. 

COCAACCCf :r<:Ant;<:c:AATGG-3\ The reaction pro- 
ciucco a 2V5Mp rC.k product. 

Amplification reactions (SO |xJ) comaiiu'il a DNA 
sample, )0x I'C.R liuffe.r II (S p-l), 200 p.M tlAII 1 , dCTI', 
dGTt\ and 400 jtM riUTI>, 4 tm< MgCI,, Units Ampll 

DNA polymerase, 0.5 unit Ampttrnse uracil N-fily* 

tMi.iylum.- (UNO), £0 pmok- of cftch faciei VIM ttrliiK-i, unci 15 
pnoh* <>f itui*h p »c.t In pttnicr. Hiu K'oi'llot^ hUo i:i>nlohic<) 
OHO Of Ihc following (leMcctlnii pr<il w's (HMJ nu rnrli): 

j'8pn»b<* A'(KAW>Ac:crjYrj , e:fuc:c:T<jf.-in , (rrm:Tt:T- 

GCCTT(TAMRA)p 3' «mf P-ntih» probe 5 f tTAM)AT(JOt li > 
XCrAMKA)CCCCr:ATGCt":ATCp-.T whrrc p indicates 
plioflphorylrtlinn find X hidi rotes a linker arm nucleotide. 
RencUon IuIk-s wrn- Mic:mAn\p Opiiont Tubes (part rtunn- 
1.«t NK01 Pcridn lUniux) tiiat wore f roe tod (nt IVrWn 

r.lmcr) (o prvv«-iil lijjlil from /c fled Tube cop* were 
simitar 1» Mi»*n>Au"ip t^nps hut specially dciiftncd lo pre- 
Ycnl Hfchl scat Icrbig.. All <il IHIU CmiAwmhbU-* were »m>v 
by PK Applied Hiysyd^ins (I'oabT C!Hy, CX) except 
ihv fur.ior VIU primers, wlilrb vivw sy ntbesl/i'd a I Cuuen 
lech, Inc. (Sf.Hith f'i»> Tran Cisco, CA). Ptobev w»w tk**»iyiUKl 
using the Oli^v 4.0 .mflwurc, /<>U*.»wlnK gutdt'lfiifs su#. 
j»C5Khi in tnc Model 7 70f) .Sequence iH'iwtor Iminum'jil 
manual, iirlcfly, prub^- 'i' m *)hhiU1 be- fli least 5 U C holier 
than Tlu* amu'ulhix teuipvMlurc uacd during ihrnndl ey- 
rltiig,' primers should not foim *U\»W duplexed wflli ibi* 
prnne. 

The ihcriufl) rycllng conditUnv*. Inrludird 2 Jiiln ftt 
${) V C and 10 min at 95"C. 'Hinrinal cycling proer riled with 



reactions were i^erfonned hwho Modol 7701) Sequence IV 
ttnKir (Pt ApplU'd Ulusy^U'uik), wldrh conl»U)v * Gciv-. 
Amp W:U Systum P<KX>. Uoa*:lton ctmditioi^ were pro- 
RruumicU on .i l'\»w«f MacinU»li V100 (Apple CompntPr, 
Santa Claru, t^\) linked djrvt-dy to the Model 770O A- 
t)uctuv IXtleclor. A*i»'y*l» *»f data w»v iwrf^nnKl nn 
the WHi lntmh compxiier, f tnllnetton and analysis tiifiwaro 
wtw devclo|>od Ht l*K Applied Blosy«Uiins. 



Traiufection of Cells with Factor VIII Construct 

VtniT T17.S flasks o! 293 cells (ATCX: CM. J 573), * human 
fetol Ulclney mfipeiiKion cell line, wvrc jjmwti to 80% con- 
lluem-y aftw*! trawfcWd pl rTIM. Colli wen- grown in llu» 
tollnwlitg incdlAt S0% HAM'S HI 2 without GMT, 5CMf> lt>w 
f Iucojc JIwJIkhxn/s modified ^xle jnetlimn (l)MKM) witb« 
oin glyciiit: wiUi sodium bicarbonate, 10% leial bovine 
serum, 2 iiim L-j(luidin»)t, anri 1% penicillin-si rcpto my- 
vln. The media was dunked 30 min *>■<. Ironsfcc 

lion, pl : U'rM DNA amount* of 40, 4, OS, and 0.1 mi were 
iiUitcii to 1..S ml of a solution containing 0.125 m CuO^ 
and 1 x l!lU J liS. Tile four mixtures wore left al rt>o)n ti»n»- 

[jc.n«\wn* (tn 1 0 min and then »#d*U**l dmpwlM- to d\o cells. 
*Hiv n«*)i> wn- bit.ul**led al 37°G and 5,% C:O s fnr 24 hr, 
cashed with PUS, and r»\tu?ipcndcd In WIS. The ri'jtim 
jn'ridi^l eclh were divided into »li*p«»(s und UNA WA4 
tnicted Immediately twiiiR IhvQIAa/np RJ*'»»h1 Kit (Qbgen. 
CUjauwrtl)! <.tA), DNA wu.s r.Kiled Into 200 ^ ^ -° ,,,W| 
TrlvllCJul pllH.0, 
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ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway have been linked to tumorigenesis in familial and 
sporadic colon carcinomas. Here we report the identification 
of two genes, WISP-1 and WISP-2, that are up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-1, but not by Wnt-4. Together with a third related gene, 
WISP-3, these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-1. These included (i) C57MG cells infected with a Wnt-1 
retroviral vector or expressing Wnt-1 under the control of a 
tetracylihe repressive promoter, and (/i) Wnt-1 transgenic 
mice. The WISP-1 gene was localized to human chromosome 
8q24.1-8q24 j. WISP-1 genomic DNA was amplified in colon 
cancer cell lines and in human colon tumors and its RNA 
overexpressed (2- to >30-fold) in 84% of the rumors examined 
compared with patient-matched normal mucosa. WISPS 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to > 40-fold) in 63% of the colon tumors analyzed. 
In contrast, WISP-2 mapped to human chromosome 20ql2- 
20ql3 and its DNA was amplified, but RNA expression was 
reduced (2- to > 30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-1 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role in colon tumorigenesis. 

Wnt-1 is a member of an expanding family of cysteine-rich t 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, cell polarity, and the establishment of cell fates (1, 
2). Wnt-1 originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-1 is not 
expressed in the. normal mammary gland, expression of Wnt-1 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsh) to the cell membrane (1, 2, 6). Dsh then inhibits the 
kinase activity of the normally constitutively active glycogen 
synthase kinase-3/3 (GSK-3/3) resulting in an increase in 
0-catenin levels. Stabilized j3-catenin interacts with the tran- 
scription factor TCF/Lefl, forming a complex that appears in 
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the nucleus and binds TCF/Lefl target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
/3-catenin levels (9). APC is phosphorylated by GSK-3/3, binds 
to /3-catenin, and facilitates its degradation. Mutations in 
either APC or 0-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized. Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, Xnr3, a member of 
the transforming growth factor (TGF)-/3 superfamily, and the 
homeobox genes, engrailed, goosecoid, twin (Xtwn), and siamois 
(2). A recent report also identifies c-myc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-1 retrovirus. Over- 
expression of Wnt-1 in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 
tially expressed between these two cell lines might contribute 
to the transformed phenotype. 

In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-1 transformed cells, WISP-1 
and WISP-2, and a third related gene, WISP-3, The WISP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), Cyr61, and 
nov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR-Select cDNA 
Subtraction Kit (CLONTECH). Tester double-stranded 

Abbreviations: TGF, transforming growth factor; CTGF, connective 
tissue growth factor; SSH, suppression subtractive hybridization; 
VWC, von Willebrand factor type C module. 

Data deposition: The sequences reported in this paper have been 
deposited in the Genbank database (accession nos. AF100777, 
AF1 00778, AF100779, AF100780, and AF100781). 
tjo whom reprint requests should be addressed, e-mail: diane@gene. 
com. 
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cDNA was synthesized from 2 jig of poly(A) + RNA isolated 
from the C57MG/Wnt-1 cell line and driver cDNA from 2 u,g 
of poly(A)* RN A from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA Library Screening. Clones encoding full-length 
mouse WISP-] were isolated by screening a AgtlO mouse 
embryo cDNA library (CLONTECH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WISP-I 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
coding full-length mouse and human W1SP-2 were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding WISP-3 were cloned from human 
bone marrow and fetal kidney libraries. 

Expression of Human WISP RNA. PCR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 of each dNTP at 
94°C for 1 sec, 62°C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and glyceraldehyde-3-phosphate dehydrogenase primer 
sequences are available on request. 

In Situ Hybridization. 33 P-labeIed sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse WISP-1 or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WISP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-amplified, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CO-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by using Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization in 7 M GuSCN followed by centrifugation 
over CsCl cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISPs and c-m>-c in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR. Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantitate the genes. The 
relative gene copy number was derived by using the formula 
2< ict > where ACt represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA. The 
d-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The WISP-sptcific signal was 
normalized to that of the glyceraldehyde-3-phosphate dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-Elmer Applied Biosystems. 

RESULTS 

Isolation of WISP-1 and WISP-2 by SSH. To identify Wnt- 
1-inducible genes, we used the technique of SSH using the 
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mouse mammary epithelial cell line C57MG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or homo- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/Wnt-1 cells. 

Two of the cDNAs, WISP-1 and WISP-2, were differentially 
expressed, being induced in the C57MG/Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. 1 A and B). Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on /3-catenin levels (13, 14). Expression of WISP-1 was 
up-regulated approximately 3-fold in the C57MG/Wnt-1 cell 
line and WISP-2 by approximately 5-fold by both Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-1 mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISPs may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-1 were isolated and the 
sequence compared with mouse WISP-1. The cDNA sequences 
of mouse and human WISP-1 were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of ^40,000 (M t 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and are 84% identical (Fig. 2A). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of ^27,000 (M T 27 K) (Fig. 2B). Mouse and human 
WISP-2 are 73% identical. Human WISP-2 has no potential 
N-linked glycosylation sites, and mouse WISP-2 has one at 
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Fig. 1. WISP-1 and WISP-2 are induced by Wnt-1, but not Wnt-4, 
expression in C57MG cells. Northern analysis of WISP-1 {A) and 
WISP-2 {B) expression in C57MG, C57MG/ Wnt-1, and C57MG/ 
Wnt-4 cells. Poly(A) + RNA (2 fig) was subjected to Northern blot 
analysis and hybridized with a 70-bp mouse WISP- 1 -specific probe 
(amino acids 278-300) or a 190-bp WISP- 2- specific probe (nucleotides 
1438-1627) in the 3' untranslated region. Blots were rehybridized with 
human 0- act in probe. 





Cell Biology, Medical Sciences: Pennica et al. 



: ::: Off Qr [H^r^aoI3:[IJ3 



Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WISP-1 (A) and mouse and human W1SP-2 (B). The potential 
signal sequence, insulin-like growth factor-binding protein (IGF-BP), 
VWC, thrombospondin (TSP), and C-terminal (CT) domains are 
underlined.. 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among the 38 cysteines found in WISP-1. 

Identification of WISP-3. To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
WISP-1 protein sequence and identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A full-length human 
WISP-3 cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354 r aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linked glycosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
' human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity with WISP-3 (Fig. 3/1). 

WISPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISP-3 are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP-2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors. This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogenic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced by TGF-/3 (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to Wnt-1. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix. 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 3B) (21). The N-terminal domain, which includes the first 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor (IGF)- 
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Fig. 3. (A) Encoded amino acid sequence alignment of human 
WISPs. The cysteine residues of WISP-1 and WISP-2 that are not 
present in WISP-3 are indicated with a dot. (£) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot. (C) Expression of 
WISP mRNA in human tissues. PCR was performed on human 
multiple-tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von Wil- 
lebrand factor type C module (VWC), also found in certain 
collagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3 A and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described to date but is 
absent in WISP-2 (Fig. 3 A and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown). 

Expression of WISP mRNA in Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
WISP-1 expression was seen in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected in the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus. WISP-2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISPS was seen in adult 
kidney and testis and fetal kidney. Lower levels of WISPS 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization of WlSP-1 and WISP-2. Expression of 
WlSP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WlSP-1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D). However, low- 
level WlSP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast. Like WISP-1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from Wnt-1 transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 
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Fig. 4. (A, C,£, and G) Representative hematoxylin/eosin-stained 
images from breast tumors in Wnt-1 transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown in B and 
D. The tumor is a moderately well-differentiated adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and 5), 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). At higher magnification, expression is seen 
in the stromal(s) fibroblasts (C and D), and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas; Images of WISP-2 expression are shown in E-H. At low 
power (£ and F), expression of WISP-2 is seen in cells lying within the 
fibrovascular tumor stroma. At higher magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (C and H). 



the predominant cell type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Genes. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels. WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lod) score 16.31] on chromosome 8q24.1 to 8q24.3, in the 
same region as the human locus of the novH family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHGC-33922 (lod = 1,000) on 
chromosome 20ql2-20ql3.1. Human WISPS mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). W1SP-3 is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
MYB (27, 29). 

Amplification and Aberrant Expression of WISPs in Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon cancer cell lines was 
assessed by quantitative PCR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fpld) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-myc, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by . 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
for WISP-2 in 92% of the tumors {P < 0.001 for each). The 
copy number for WISPS was indistinguishable from one (P - 
0.166). In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 (P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa were 
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Fig. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. {A) Amplification in cell line DNA was determined by quanti- 
tative PCR. (B) Southern blots containing genomic DNA (10 >ig) 
digested with £coRI {WISP-1) or Xbal {c-myc) were hybridized with 
a 100-bp human WISP-1 probe (amino acids 186-219) or a human 
c-myc probe (located at bp 1901-2000). The WISP and myc genes are 
detected in normal human genomic DNA after a longer film exposure. 
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Fig. 6. Genomic amplification of WISP genes in human colon 
tumors. The relative gene copy number of the WISP genes in 25 
adenocarcinomas was assayed by quantitative PCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means i SEM from one experiment done in 
triplicate. The experiment was repeated at least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-fold) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10-fold overexpression. 
In contrast, in 79% (15/19) of the tumors examined, WISP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP- 1, WISP-3 RNA was ove rex pressed in 
63% (12/19) of the colon tumors compared with the normal 
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Fig. 7. WISP RNA expression in primary human colon tumors 
relative to expression in normal mucosa from the same patient. 
Expression of WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least twice. 
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mucosa. The amount of overexpression of WISP-3 ranged from 
4- to >40-fold. 



DISCUSSION 

One approach to understanding the molecular basis of cancer 
is to identify differences in gene expression between cancer 
cells and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-1. 

Three of the genes isolated, WISP-1 7 WISP-2, and WISP-3, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyr61, and nov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-1. 
The first was C57MG cells infected with a Wnt-1 retroviral 
vector or C57MG cells expressing Wnt-1 under the control of 
a tetracyline-repressible promoter, and the second was in 
Wnt-1 transgenic mice, where breast tissue expresses Wnt-1, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt-1 and WISPs in that in these two situations, 
WISP induction was correlated with Wnt-1 expression. 

It is not clear whether the WISPs are directly or indirectly 
induced , by the downstream components of the Wnt-1 signaling 
pathway (i.e., j3-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l-transformed cells, hours 
or days after Wnt-1 transformation. Thus, WISP expression 
could result from Wnt-1 signaling directly through /3-catenin 
transcription factor regulation or alternatively through Wnt-1 
signaling turning on a transcription factor, which in turn 
regulates WISPs. 

The WISPs define an additional subfamily of the CCN family 
of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr61, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-/3, platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin a v /33 serves as 
an adhesion receptor for Cyr61 (33). 

The strong expression of WISP-1 and WISP-2 in cells lying 
within the flbrovascular tumor stroma in breast tumors from 
Wnt-1 transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar to that controlling connective 
- tissue formation during wound repair. It has been proposed 
that mammary tumor cells or inflammatory cells at the tumor 
interstitial interface secrete TGF-/31, which is the stimulus for 
stromal proliferation (34). TGF-/31 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed in the stromal cells that surrounded the tumor cells 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply WISP-1 and 
W1SP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-1 gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISPS RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified in the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression in normal 
colonic mucosa from the same patient. The gene for human 
WISP-2 was localized to chromosome 20ql2-20ql3, at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers, suggest- 
ing the existence of one or more oncogenes at this locus 
(36-38). Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene in this 
amplicon. 

A recent .manuscript on rCop-1^ the rat orthologue of 
WISP-2, describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis coli and £-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic j3-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISPs. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISPs in human 
colon tumors may indicate an important role for these genes 
in tumor development. . 
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methods. Peptides AENK or AEQK were dissolved in water, made isotonic with 
NaCl and diluted into RPM1 growth medium. T-cell-proliferation assays were 
done essentially as described 20,21 . Briefly, after antigen pulsing (30M.gmr' 
TTCF) with letrapeptides ( 1-2 mg m]' 1 ), PBMCs or EBV-B cells were 
washed in PBS and fixed for 45 s in 0.05% glutaraldehyde. Glycine was added 
to a final concentration of 0.1 M and the cells were washed five times in RPM1 
1640 medium containing 1% FCS before co-culture with T-ceU clones in 
round-bottom 96-well microtitre plates. After 48 h, the cultures were pulsed 
with 1 m-O of 3 H-thymidine and harvested for scintillation counting 16 h later. 
Predigestion of native TTCF was done by incubating 200 jxg TTCF with 0.25 u.g 
pig kidney legumain in 500 ^1 50 mM citrate buffer, pH 5.5, for 1 h at 37 °C. 
Glycopeptide digestions. The peptides HIDT^EEDI, HIDN(N-glucosamine) 
EEDI and HIDNESD1, which are based on the TTCF sequence, and 
QQQHLFGSNVTDCSGNFCLFR(KKK), which is based on human transferrin, 
were obtained by custom synthesis. The three C-terminal lysine residues were 
added to the natural sequence to aid solubility. The transferrin glycopeptide 
QQQHLFGSNVTDCSGNFCLFR was prepared by tryptic (Promega) digestion 
of 5mg reduced, carboxy- methylated human transferrin followed by 
concanavalin A chromatography 11 . Clycopeptides corresponding to residues 
622-642 and 421-452 were isolated by reverse-phase HPLC and identified by 
mass spectrometry and N- terminal sequencing. The lyophilized transferrin- 
derived peptides were redissolved in 50 mM sodium acetate, pH 5.5, 10 mM 
dithiothreitol, 20% methanol. Digestions were performed for 3 h at 30 °C with 
5-50 mUmT 1 pig kidney legumain or B-cell AEP. Products were analysed by 
HPLC or MALD1-TOF mass spectrometry using a matrix of lOmgmP 1 o> 
cyanocinnamic acid in 50% acetonitrile/0.1% TFA and a PerSeptive Biosystems 
Elite STR mass spectrometer set to linear or reflector mode. Internal standar- 
dization was obtained with a matrix ion of 568.13 mass units. 
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Fas ligand (FasL) is produced by activated T cells and natural 
killer cells and it induces apoptosis (programmed cell death) in 
target cells through the death receptor Fas/Apol/CD95 (ref. 1). 
One important role of FasL and Fas is to mediate imxnune- 
cytotoxic killing of cells that are potentially harmful to the 
organism, such as virus-infected or tumour cells 1 . Here we 
report the discovery of a soluble decoy receptor, termed decoy 
receptor 3 (DcR3), that binds to FasL and inhibits FasL-induced 
apoptosis. The DcR3 gene was amplified in about half of 35 
primary lung and colon tumours studied, and DcR3 messenger 
RNA was expressed in malignant tissue. Thus, certain tumours 
may escape FasL-dependent immune-cytotoxic attack by expres- 
sing a decoy receptor that blocks FasL. 

By searching expressed sequence tag (EST) databases, we identi- 
fied a set of related ESTs that showed homology to the tumour 
necrosis factor (TNF) receptor (TNFR) gene superfamily 2 . Using 
the overlapping sequence, we isolated a previously unknown full- 
length complementary DNA from human fetal lung. We named the 
protein encoded by this cDNA decoy receptor 3 (DcR3). The cDNA 
encodes a 300-amino-acid polypeptide that resembles members of 
the TNFR family (Fig. la): the amino terminus contains a leader 
sequence, which is followed by four tandem cysteine-rich domains 
(CRDs). Like one other TNFR homologue, osteoprotegerin (OPG) 3 , 
DcR3 lacks an apparent transmembrane sequence, which indicates 
that it may be a secreted, rather than a membrane-asscociated, 
molecule. We expressed a recombinant, histidine-tagged form of 
DcR3 in mammalian cells; DcR3 was secreted into the ceU culture 
medium, and migrated on polyacrylamide gels as a protein of 
relative molecular mass 35,000 (data not shown). DcR3 shares 
sequence identity in particular with OPG (31%) and TNFR2 
(29%), and has relatively less homology with Fas (17%). All of 
the cysteines in the four CRDs of DcR3 and OPG are conserved; 
however, the carboxy-terminal portion of DcR3 is 101 residues 
shorter. 

We analysed expression of DcR3 mRNA in human tissues by 
northern blotting (Fig. lb). We detected a predominant 1 .2-kilobase 
transcript in fetal lung, brain, and liver, and in adult spleen, colon 
and lung. In addition, we observed relatively high DcR3 mRNA 
expression in the human colon carcinoma cell line SW480. 

To investigate potential ligand interactions of DcR3, we generated 
a recombinant, Fc-tagged DcR3 protein. We tested binding of 
DcR3-Fc to human 293 cells transfected. with individual TNF- 
family ligands, which are expressed as type 2 transmembrane 
proteins (these transmembrane proteins have their N termini in 
the cytosol). DcR3-Fc showed a significant increase in binding to 
cells transfected with FasL 4 (Fig. 2a), but not to cells transfected with 
TNF 5 , Apo2L/TRAIL 6 ' 7 , Apo3L/TWEAK 8,9 , or OPGL/TRANCE/ 
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RANKL 10 " 12 (data not shown). DcR3-Fc immunoprecipitated shed 
FasL from FasL-transfected 293 cells (Fig. 2b) and purified soluble 
FasL (Fig. 2c), as did the Fc-tagged ectodomain of Fas but not 
TNFR1. Gel- filtration chromatography showed that DcR3-Fc and 
soluble FasL formed a stable complex (Fig. 2d). Equilibrium 
analysis indicated that DcR3-Fc and Fas-Fc bound to soluble 
FasL with a comparable affinity (K d = 0.8 ± 0.2 and 
Ll±0.1nM, respectively; Fig. 2e), and that DcR3-Fc could 
block nearly all of the binding of soluble FasL to Fas-Fc (Fig. 2e, 
inset). Thus, DcR3 competes with Fas for binding to FasL. 

To determine whether binding of DcR3 inhibits FasL activity, we 
tested the effect of DcR3-Fc on apoptosis induction by soluble 
FasL in Jurkat T leukaemia cells, which express Fas (Fig. 3a). DcR3- 
Fc and Fas-Fc blocked soluble-FasL-induced apoptosis in a 
similar dose-dependent manner, with half-maximal inhibition at 
—0.1 jig ml" 1 . Time-course analysis showed that the inhibition did 
not merely delay cell death, but rather persisted for at least 24 hours 
(Fig. 3b). We also tested the effect of DcR3-Fc on activation- 
induced cell death (AJCD) of mature T lymphocytes, a FasL- 
dependent process 1 . Consistent with previous results 13 , activation 
of interleukin-2-stimulated CD4-positive T cells with anti-CD3 
antibody increased the level of apoptosis twofold, and Fas-Fc 
blocked this effect substantially (Fig. 3c); DcR3-Fc blocked the 



induction of apoptosis to a similar extent. Thus, DcR3 binding 
blocks apoptosis induction by FasL. 

FasL-induced apoptosis is important in elimination of virus- 
infected cells and cancer cells by natural killer cells and cytotoxic T 
lymphocytes; an alternative mechanism involves perforin and 
granzymes 1,14 " 16 . Peripheral blood natural killer cells triggered 
marked cell death in Jurkat T leukaemia cells (Fig. 3d); DcR3-Fc 
and Fas-Fc each reduced killing of target cells from —65% to 
— 30%, with half-maximal inhibition at —1 u-gml -1 ; the residual 
killing was probably mediated by the perforin/granzyme pathway. 
Thus, DcR3 binding blocks FasL-dependent natural killer cell 
activity. Higher DcR3-Fc and Fas-Fc concentrations were required 
to block natural killer cell activity compared with those required to 
block soluble FasL activity, which is consistent with the greater 
potency of membrane-associated FasL compared with soluble 
FasL 17 . 

Given the role of immune-cytotoxic cells in elimination of 
tumour cells and the fact that DcR3 can act as an inhibitor of 
FasL, we proposed that DcR3 expression might contribute to the 
ability of some tumours to escape immune-cytotoxic attack. As 
genomic amplification frequently contributes to tumorigenesis, we 
investigated whether the DcR3 gene is amplified in cancer. We 
analysed DcR3 gene-copy number by quantitative polymerase chain 
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Figure 1 Primary structure and expression of human DcR3. a, Alignment of the 
amino-acid sequences of DcR3 and of osteoprotegerin (OPG); the C-terminal 101 
residues of OPG are not shown. The putative signal cleavage site (arrow), the 
cysteine-rich domains (CRD 1 -4), and the AAlinked glycosylation site (asterisk) are 
shown, b, Expression of DcR3 mRNA. Northern hybridization analysis was done 
using the DcR3 cDNA as a probe and blots of pofy(A)* RNA (Clontech) from 
human fetal and adult tissues or cancer cell lines. PBL, peripheral blood 
lymphocyte. 
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Figure 2 Interaction of DcR3 with FasL. a, 293 cells were transfected with pRK5 
vector (top) or with pRK5 encoding full-length FasL (bottom), incubated with 
DcR3-Fc (solid line, shaded area), TNFR1 -Fc (dotted line) or buffer control 
(dashed line) (the dashed and dotted lines overlap), and analysed for binding by 
FACS. Statistical analysis showed a significant difference (P < 0.001 ) between the 
binding of DcR3-Fc to cells transfected with FasL or pRK5. PE, phycoerythrin- 
labelled celts, b, 293 cells were transfected as in a and metabolically labelled, and 
cell supernatants were immunoprecipitated with Fc-tagged TNFRl, DcR3 or Fas. 
c, Purified soluble FasL (sFasL) was immunoprecipitated with TNFR1 -Fc, DcR3- 
Fc or Fas-Fc and visualized by immunoblot with anti-FasL antibody. sFasL was 
loaded directly for comparison in the right-hand lane, d, Flag-tagged sFasL was 
incubated with DcR3-Fc or with buffer and resolved by gel filtration; column 
fractions were analysed in an assay that detects complexes containing DcR3-Fc 
and sFasL-Flag. e, Equilibrium binding of DcR3-Fc or Fas-Fc to sFasL-Flag. 
Inset, competition of DcR3-Fc with Fas-Fc for binding to sFasL-Flag. 
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reaction (PCR) 18 in genomic DNA from 35 primary lung and colon 
tumours, relative to pooled genomic DNA from peripheral blood 
leukocytes (PBLs) of 10 healthy donors. Eight of 18 lung tumours 
and 9 of 17 colon tumours showed DcR3 gene amplification, 
ranging from 2- to 18-fold (Fig. 4a, b). To confirm this result, we 
analysed the colon tumour DNAs with three more, independent sets 
of DcR3 -based PGR primers and probes; we observed nearly the 
same amplification (data not shown). 

We then analysed DcR3 mRNA expression in primary tumour 
tissue sections by in situ hybridization. We detected DcR3 expres- 
sion in 6 out of 15 lung tumours, 2 out of 2 colon tumours, 2 out of 5 
breast tumours, and 1 out of 1 gastric tumour (data not shown). A 
section through a squamous-cell carcinoma of the lung is shown in 
Fig. 4c. DcR3 mRNA was localized to infiltrating malignant epithe- 
lium, but was essentially absent from adjacent stroma, indicating 
tumour-specific expression. Although the individual tumour speci- 
mens that we analysed for mRNA expression and gene amplification 
were different, the in situ hybridization results are consistent with 
the finding that the DcR3 gene: is amplified frequently in tumours. 
SW480 colon carcinoma cells, which showed abundant DcR3 
mRNA expression (Fig. lb), also had marked DcR3 gene amplifica- 
tion, as shown by quantitative PCR (fourfold) and by Southern blot 
hybridization (fivefold) (data not shown). 

If DcR3 amplification in cancer is functionally relevant, then 
DcR3 should be amplified more than neighbouring genomic 
regions that are not important for tumour survival. To test this, 



we mapped the human DcR3 gene by radiation-hybrid analysis; 
DcR3 showed linkage to marker AFM218xe7 (Tl 60), which maps to 
chromosome position 20ql3. Next, we isolated from a bacterial 
artificial chromosome (BAC) library a human genomic clone that 
carries DcR3, and sequenced the ends of the clone's insert. We then 
determined, from the nine colon tumours that showed twofold or 
greater amplification of DcR3, the copy number of the DcR3- 
flanking sequences (reverse and forward) from the BAC, and of 
seven genomic markers that span chromosome 20 (Fig. 4d). The 
DcR3 -linked reverse marker showed an average amplification of 
roughly threefold, slightly less than the approximately fourfold 
amplification of DcR3; the other markers showed little or no 
amplification. These data indicate that DcR3 may be at the 'epi- 
centre' of a distal chromosome 20 region that is amplified in colon 
cancer, consistent with the possibility that DcR3 amplification 
promotes tumour survival. 

Our results show that DcR3 binds specifically to FasL and inhibits 
FasL activity. We did not detect DcR3 binding to several other TNF- 
ligand- family members; however, this does not rule out the possi- 
bility that DcR3 interacts with other ligands, as do some other 
TNFR family members, including OPG 2 * 19 . 

FasL is important in regulating the immune response; however, 
little is known about how FasL function is controlled. One mechan- 
ism involves the molecule cFLIP, which modulates apoptosis signal- 
ling downstream of Fas 20 . A second mechanism involves proteolytic 
shedding of FasL from the cell surface 17 . DcR3 competes with Fas for 
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Figure 3 Inhibition of FasL activity by DcR3. a, Human Jurkat T leukaemia ceils 
were incubated with Rag-tagged soluble FasL (sFasL;.5ngml"') oligomerized 
with anti-Flag antibody (0.1 M-g mr ') in tne presence of the proposed inhibitors 
DcR3-Fc, Fas-Fc or human IgGl arid assayed for apoptosis (mean * s.e.m. of 
triplicates), b. Jurkat cells were incubated with sFasL-Fiag.plus anti-Flag antibody 
as in a, in presence of 1 u.g mf DcR3-Fc (rilled circles). Fas-Fc (open circles) or 
human IgGl (triangles), and apoptosis was determined at the indicated time 
points, c. Peripheral blood T cells were stimulated with PHA and interleukin-2. 
followed by control (white bars) or anti-CD3 antibody (filled bars), together with 
phosphate-buffered saline (PBS), human IgGl, Fas-Fc, or DcR3-Fc (10 ng ml"'). 
After 16 h, apoptosis of CD4* cells was determined (mean ± s.e.m. of results from 
five donors), d, Peripheral blood natural killer cells were incubated with 5, Cr- 
labelled Jurkat celts in the presence of DcR3-Fc (filled circles). Fas-Fc (open 
circles) or human IgGl (triangles), and target-ceil death was determined by 
release of 6, Cr (mean ± s.d. for two donors, each in triplicate).- 



Figure 4 Genomic amplification of DcR3 in tumours, a, Lung cancers, comprising 
eight adenocarcinomas (c, d. f, g, h, j, k, r), seven squamous-cell carcinomas (a, e, 
m, n. o. p, q). one non-small-cell carcinoma (b), one small-ceti carcinoma (i), and 
one bronchial adenocarcinoma (I). The data are means * s.d. of 2 experiments 
done in duplicate, b. Colon tumours, comprising 17 adenocarcinomas. Data are 
means ± s.e.m. of five experiments done in dupficate. c, In situ hybridization 
analysis of DcR3 mRNA expression in a squamous-cell carcinoma of the lung. A 
representative bright-field image (left) and the corresponding dark-field image 
(right) show DcR3 mRNA over infiltrating malignant epithelium (arrowheads). 
Adjacent non-malignant stroma (S), blood vessel (V) and necrotic tumour tissue 
(N) are also shown, d. Average amplification of DcR3 compared with amplifica- 
tion of neighbouring genomic regions (reverse and forward. Rev and Fwd), the 
DcR3-linked marker T160, and other chromosome-20 markers, in the nine colon 
tumours showing DcR3 amplification of twofold or more (b). Data are from two 
experiments done in duplicate. Asterisk indicates P < 0.01 for a Student's f-test 
comparing each marker with DcR3. 
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FasL binding; hence, it may represent a third mechanism of 
extracellular regulation of FasL activity. A decoy receptor that 
modulates the function of the cytokine interleukin-1 has been 
described 21 . In addition, two decoy receptors that belong to the 
TNFR family, DcRl and DcR2, regulate the FasL-related apoptosis- 
inducing molecule Apo2L 22 . Unlike DcRl and DcR2, which are 
membrane-associated proteins, DcR3 is directly secreted into the 
extracellular space. One other secreted TNFR- family member is 
OPG 3 , which shares greater sequence homology with DcR3 (31%) 
than do DcRl (17%) or DcR2 (19%); OPG functions as a third 
decoy for Apo2L 19 . Thus, DcR3 and OPG define a new subset of 
TNFR-family members that function as secreted decoys to mod- 
ulate ligands that induce apoptosis. Pox viruses produce soluble 
TNFR homologues that neutralize specific TNF-family ligands, 
thereby modulating the antiviral immune response 2 . Our results 
indicate that a similar mechanism, namely, production of a soluble 
decoy receptor for FasL, may contribute to immune evasion by 
certain tumours. D 

Methods 

Isolation of DcR3 cDNA. Several overlapping ESTs in GenBank (accession 
numbers AA025672, AA025673 and W67560) and in Lifeseq™ (Iricyte 
Pharmaceuticals; accession numbers 1339238, 1533571, 1533650, 1542861, 
1789372 and 2207027) showed similarity to members of the TNFR family. We 
screened human cDNA libraries by PCR with primers based on the region of 
EST consensus; fetal lung was positive for a product of the expected size. By 
hybridization to a PCR-generated probe based on the ESTs, one positive clone 
(DNA30942) was identified. When searching for potential alternatively spliced 
forms of DcR3 that might encode a transmembrane protein, we isolated 50 
more clones; the coding regions of these clones were identical in size to that of 
the initial clone (data not shown). 

Fc-fusion proteins (immunoadhesins). The entire DcR3 sequence, or the 
ectodomain of Fas or TNFR1, was fused to the hinge and Fc region of human 
IgGl, expressed in insect SF9 cells or in human 293 cells, and purified as 
described 23 . 

Fluorescence-activated cell sorting (FACS) analysis. We transfected 293 
cells using calcium phosphate or Effectene (Qiagen) with pRK5 vector or pRK5 
encoding full-length human FasL 4 (2 u.g), together with pRK5 encoding CrmA 
(2p.g) to prevent cell death. After 16h, the cells were incubated with 
biotinylated DcR3-Fc or TNFRl-Fc and then with phycoerythrin -conjugated 
streptavidin (GibcoBRL), and were assayed by FACS. The data were analysed by 
Kolmogorov-Smirnov statistical analysis. There was some detectable staining 
of vector-transfected cells by DcR3-Fc; as these cells express little FasL (data 
not shown), it is possible that DcR3 recognized some other factor that is 
expressed constitutively on 293 cells. 

Immunoprecipitation. Human 293 cells were transfected as above, and 
metabolically labelled with [ 35 S}cysteine and [ 35 S]methionine (0.5 mCi; 
Amersham). After 16 h of culture in the presence of z-VAD-fmk (10u.M), 
the medium was immunoprecipitated with DcR3-Fc, Fas-Fc or TNFRl-Fc 
(5 u.g), followed by protein A-Sepharose (Repligen). The precipitates were 
resolved by SDS-PAGE and visualized on a phosphorimager (Fuji BAS2000). 
Alternatively, purified, Flag-tagged soluble FasL (1 u.g) (Alexis) was incubated 
with each Fc-fusion protein (1 u,g), precipitated with protein A-Sepharose, 
resolved by SDS-PAGE and visualized by immunoblotting with rabbit anti- 
FasL antibody (Oncogene Research). 

Analysis of complex formation. Flag-tagged soluble FasL (25u.g) was 
incubated with buffer or with DcR3-Fc (40 u.g) for 1.5 h at 24 °C. The reaction 
was loaded onto a Superdex 200 HR 10/30 column (Pharmacia) and developed 
with PBS; 0.6-ml fractions were collected. The presence of DcR3-Fc-FasL 
complex in each fraction was analysed by placing 1 00 p.1 aliquots into microtitre 
wells precoated with anti-human IgG (Boehringer) to capture DcR3-Fc, 
followed by detection with biotinylated anti-Flag antibody Bio M2 (Kodak) and 
streptavidin-horseradish peroxidase (Amersham). Calibration of the column 
indicated an apparent relative molecular mass of the complex of 420K (data not 
shown), which is consistent with a stoichiometry of two DcR3-Fc homodimers 
to two soluble FasL homotrimers. 

Equilibrium binding analysis. Microtitre wells were coated with anti-human 



IgG, blocked with 2% BSA in PBS. DcR3-Fc or Fas-Fc was added, followed by 
serially diluted Flag-tagged soluble FasL. Bound ligand was detected with anti- 
Flag antibody as above. In the competition assay, Fas-Fc was immobilized as 
above, and the wells were blocked with excess IgGl before addition of Flag- 
tagged soluble FasL plus DcR3-Fc. 

T-cell AICD. CD3 + lymphocytes were isolated from peripheral blood of 
individual donors using anti-CD3 magnetic beads (Miltenyi Biotech), 
stimulated with phytohaemagglutinin (PHA; 2 lAgmT 1 ) for 24 h, and cultured 
in the presence of interleukin-2 ( 100 U ml" 1 ) for 5 days. The cells were plated in 
wells coated with anti-CD3 antibody (Pharmingen) and analysed for apoptosis 
16 h later.by FACS analysis of anncxin-V-binding of CD4 + cells". 
Natural killer cell activity. Natural killer cells were isolated from peripheral 
blood of individual donors using anti-CD56 magnetic beads (Miltenyi 
Biotech), and incubated for 16 h with 51 Cr-loaded Jurkat cells at an effector- 
to-target ratio of 1:1 in the presence of DcR3-Fc, Fas-Fc or human IgGl. 
Target -cell death was determined by release of 51 Cr in effector- target co- 
cultures relative to release of 5l Cr by detergent lysis of equal numbers of Jurkat 
cells. 

Gene-amplification analysis. Surgical specimens were provided by ). Kern 
(lung tumours) and P. Quirke (colon tumours). Genomic DNA was extracted 
(Qiagen) and the concentration was determined using Hoechst dye 33258 
intercalation fluorometry. Amplification was determined by quantitative PCR 1 * 
using a TaqMan instrument (ABI). The method was validated by comparison of 
PCR and Southern hybridization data for the Myc and HER- 2 oncogenes (data 
not shown). Gene-specific primers and fluorogenic probes were designed on 
the basis of the sequence of DcR3 or of nearby regions identified on a BAC 
carrying the human DcR3 gene; alternatively, primers and probes were based 
on Stanford Human Genome Center marker AFM218xe7 (T160), which is 
linked to DcR3 (likelihood score = 5.4), SHGC-36268 (T159), the nearest 
available marker which maps to —500 kilobases from T160, and five extra 
markers that span chromosome 20. The DcR3-specific primer sequences were 
5'-CTTCTTCGCGCACGCTG-3' and 5'-ATCACGCCGGCACCAG-3' and the 
fluorogenic probe sequence was 5'-(FAM-ACACGATGCGTGCTCCAAGCAG 
AAp-(TAMARA), where FAM is 5' -fluorescein phosphoramidite. Relative 
gene -copy numbers were derived using the formula 2 (ACT \ where ACT is the 
difference in amplification cycles required to detect DcR3 in peripheral blood 
lymphocyte DNA compared to test DNA. 
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ABC transporters (also known as traffic ATPases) form a large 
family of proteins responsible for the translocation of a variety 
of compounds across membranes of both prokaryotes and 
eukaryotes 1 . The recently completed Escherichia colt genome 
sequence revealed that the largest family of paraJogous E. coli 
proteins is composed of ABC transporters 2 . Many eukaryotic 
proteins of medical significance belong to this family, such as 
the cystic fibrosis transmembrane conductance regulator (CFTR), 
the P-glycoprotein (or multidrug-resistance protein) and the 
heterodimeric transporter associated with antigen processing 
(Tapl-Tap2). Here we report the crystal structure at 1.5 A resolu- 
tion of HisP, the ATP-binding subunit of the histidine permease, 
which is an ABC transporter from Salmonella typhimurium. We 
correlate the details of this structure with the biochemical, genetic 
and biophysical properties of the wild-type and several mutant 
HisP proteins. The structure provides a basis for understanding 
properties of ABC transporters and of defective CFTR proteins. 

ABC transporters contain four structural domains: two nucleo- 
tide-binding domains (NBDs), which are highly conserved 
throughout the family, and two transmembrane domains'. In 
prokaryotes these domains are often separate subunits which are 
assembled into a membrane-bound complex; in eukaryotes the 
domains are generally fused into a single polypeptide chain. The 
periplasmic histidine permease of S. typhimurium and £. co/i M_B is a 
well-characterized ABC transporter that is a good model for this 
superfamily. It consists of a membrane-bound complex, HisQMP 2 , 
which comprises integral membrane subunits, HisQ and HisM, and 
two copies of HisP, the ATP-binding subunit. HisP, which has 
properties intermediate between those of integral and peripheral 
membrane proteins 9 , is accessible from both sides of the membrane, 
presumably by its interaction with HisQ and HisM 6 . The two HisP 
subunits form a dimer, as shown by their cooperativity in ATP 
hydrolysis 5 , the requirement for both subunits to be present for 
activity*, and the formation of a HisP dimer upon chemical cross- 
linking. Soluble HisP also forms a dimer 3 . HisP has been purified 
and characterized in an active soluble form 3 which can be recon- 
stituted into a fully active membrane-bound complex 8 . 

The overall shape of the crystal structure of the HisP monomer is 
that of an T with two thick arms (arm I and arm II); the ATP- 
binding pocket is near the end of arm I (Fig. 1). A six-stranded p- 
sheet (p3 and (38-p 12) spans both arms of the L, with a domain of a 
ct- plus P-type structure (pi, (32, P4-P7, al and ct2) on one side 
(within arm I) and a domain of mostly a-helices (a3-a9) on the 




Figure 1 Crystal structure of HisP. a, View of the dimer along an axis 
perpendicular to its two-fold axis. The top and bottom of the dimer are suggested 
to face towards the periplasmic and cytoplasmic sides, respectively (see text). 
The thickness of arm II is about 25 A, comparable to that of membrane. o-Helices 
are shown in orange and p-sheets in green, b, View along the two-fold axis of the 
HisP dimer, showing the relative displacement of the monomers not apparent in 
a. The p-strands at the dimer interface are labelled, c, View of one monomer from 
the bottom of arm I, as shown in a, towards arm II, showing the ATP-binding 
pocket, a-c. The protein and the bound ATP are in 'ribbon* and 'ball-and-stick* 
representations, respectively. Key residues discussed in the text are indicated in 
c. These figures were prepared with MOLSCRiPT* 9 . N. amino terminus; C. C 
terminus. 
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Gene amplification is a common event in the progression of 
human cancers, and amplified oncogenes have been shown to 
have diagnostic, prognostic and therapeutic relevance. A 
kinetic quantitative polymerase-chain-reaction (PCR) method, 
based on fluorescent TaqMan methodology and a new instru- 
ment (ABI Prism 7700 Sequence Detection System) capable 
of measuring fluorescence in real-time, was used to quantify 
gene amplification in tumor DNA. Reactions are character- 
ized by the point during cycling when PCR amplification is still 
in the exponential phase, rather than the amount of PCR 
product accumulated after a fixed number of cycles.. None of 
the reaction components is limited during the exponential 
phase, meaning that values are highly reproducible in reac- 
tions starting with the same copy number. This greatly 
improves the precision of DNA quantification. Moreover, 
real-time PCR does not require ppst-PCR sample handling, 
thereby preventing potential PCR-product carry-over con- 
tamination; it possesses a wide dynamic range of quantifica- 
tion and results in much faster and higher sample throughput. 
The real-time PCR method, was used to develop and validate 
a simple and rapid assay for the detection and quantification 
of the 3 most frequently amplified genes (myc, ccndl and 
erbB2) in breast tumors. Extra copies of myc, ccndl and erbB2 
were observed in 10, 23 and 15%, respectively, of 108 breast- 
tumor DNA; the largest observed numbers of gene copies 
were 4.6, 18.6 and 15.1, respectively. These results correlated 
well with those of Southern blotting. The use of this new 
semi-automated technique will make molecular analysis of 
human cancers simpler and more reliable, and should find 
broad applications in clinical and research settings. Int. J. 
Cancer 78:661-666, 1998. 
© 1998 miey-Liss. Inc. 

Gene amplification plays an important role in the pathogenesis 
of various solid tumors, including breast cancer, probably because 
over-expression of the amplified target genes confers a selective 
advantage. The first technique used to detect genomic amplification 
was cytogenetic analysis. Amplification of several chromosome 
regions, visualized either as extrachromosomal double minutes 
(dmins) or as integrated homogeneously staining regions (HSRs), 
are among the main visible cytogenetic abnormalities in breast 
tumors. Other techniques such as comparative genomic hybridiza- 
tion (CGH) (Kallioniemi et ai, 1 994) have also been used in broad 
searches for regions of increased DNA copy numbers in tumor 
cells, and have revealed some 20 amplified chromosome regions in 
breast tumors. Positional cloning efforts are underway to identify 
the critical gene(s) in each amplified region. To date, genes known 
to be amplified frequently in breast cancers include myc (8q24), 
ccnd\ ( 1 1 q 1 3), and erbB2 ( 1 7q 1 2-q2 1 ) (for review, see Bieche and 
Lidereau, 1995). 

Amplification of the myc, ccndl, and erbB2 proto-oncogenes 
should have clinical relevance in breast cancer, since independent 
studies have shown that these alterations can be used to identify 
sub-populations with a worse prognosis (Bems et al, 1992; 
Schuuring et al, 1992; Stamon et ai, 1987). Muss et al (1994) 
suggested that these gene alterations may also be useful for the 
prediction and assessment of the efficacy of adjuvant chemotherapy 
and hormone therapy. 

However, published results diverge both in terms of the fre- 
quency of these alterations and their clinical value. For instance, 
over 500 studies in 10 years have failed to resolve the controversy 



surrounding the link suggested by Slamon et al (1987) between 
erbB2 amplification and disease progression. These discrepancies 
are partly due to the clinical, histological and ethnic heterogeneity 
of breast cancer, but technical considerations are also probably 
involved. 

Specific genes (DNA) were initially quantified in tumor cells by 
means of blotting procedures such as Southern and slot blotting. 
These batch techniques require large amounts of DNA (5-10 
ug/reaction) to yield reliable quantitative results. Furthermore, 
meticulous care is required at all stages of the procedures to 
generate blots of sufficient quality for reliable dosage analysis. 
Recently, PCR has proven to be a powerful tool for quantitative 
DNA analysis, especially with minimal starting quantities of tumor 
samples (small, early-stage tumors and formalin-fixed, paraffin- 
embedded tissues). 

Quantitative PCR can be performed by evaluating the amount of 
product either after a given number of cycles (end-point quantita- 
tive PCR) or after a varying number of cycles during the 
exponential phase (kinetic quantitative PCR). In the first case, an 
internal standard distinct from the target molecule is required to 
ascertain PCR efficiency. The method is relatively easy but implies 
generating, quantifying and storing an internal standard for each 
gene studied. Nevertheless, it is the most frequently applied 
method to date. 

One of the major advantages of the kinetic method is its rapidity 
in quantifying a new gene, since no internal standard is required (an 
external standard curve is sufficient). Moreover, the kinetic method 
has a wide dynamic range (at least 5 orders of magnitude), giving 
an accurate value for samples differing in their copy number. 
Unfortunately, the method is cumbersome and has therefore been 
rarely used. It involves aliquot sampling of each assay mix at 
regular intervals and quantifying, for each aliquot, the amplifica- 
tion product. Interest in the kinetic method has been stimulated by a 
novel approach using fluorescent TaqMan methodology and a new 
instrument (ABI Prism 7700 Sequence Detection System) capable 
of measuring fluorescence in real time (Gibson et al, 1996; Heid et 
al, 1996). The TaqMan reaction is based on the 5' nuclease assay 
first described by Holland et al (1991). The latter uses the 5' 
nuclease activity of Taq polymerase to cleave a specific fluorogenic 
oligonucleotide probe during the extension phase of PCR. The 
approach uses dual -labeled fluorogenic hybridization probes (Lee 
et al, 1993). One fluorescent dye, co-valently linked to the 5' end 
of the oligonucleotide, serves as a reporter [FAM (i.e., 6-carboxy- 
fluorescein)] and its emission spectrum is quenched by a second 
fluorescent dye, TAMRA (i.e., 6-carboxy-tetramethyl-rhodamine) 
attached to the 3' end. During the extension phase of the PCR 
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cycle, the fluorescent hybridization probe is hydrolyzed by the 
5'-3' nucleolytic activity of DNA polymerase. Nuclease degrada- 
tion of the probe releases the quenching of FAM fluorescence 
emission, resulting in an increase in peak fluorescence emission. 
The fluorescence signal is normalized by dividing the emission 
intensity of the reporter dye (FAM) by the emission intensity of a 
reference dye (i.e., ROX, 6-carboxy-X-rhodamine) included in 
TaqMan buffer, to obtain a ratio defined as the Rn (normalized 
reporter) for a given reaction tube. The use of a sequence detector 
enables the fluorescence spectra of all 96 wells of the thermal 
cycler to be measured continuously during PCR amplification. 

The real-time PCR method offers several advantages over other 
current quantitative PCR methods (Celi et aL, 1994): (i) the 
probe-based homogeneous assay provides a real-time method for 
detecting only specific amplification products, since specific hybri- 
dation of both the primers and the probe is necessary to generate a 
signal; (ii) the Q (threshold cycle) value used for quantification is 
measured when PCR amplification is still in the log phase of PCR 
product accumulation. This is the main reason why C, is a more 
reliable measure of the starting copy number than are end-point 
measurements, in which a slight difference in a limiting component 
can have a drastic effect on the amount of product; (Hi) use of Q 
values gives a wider dynamic range (at least 5 orders of magni- 
tude), reducing the need for serial dilution; (iv) The real-time PCR 
method is run in a closed-tube system and requires no post-PCR 
sample handling, thus avoiding potential contamination; (v) the 
system is highly automated, since the instrument continuously 
measures fluorescence in all 96 wells of the thermal cycler during 
PCR amplification and the corresponding software processes, and 
analyzes the fluorescence data; (vi) the assay is rapid, as results are 
available just one minute after thermal cycling is complete; (vii) the 
sample throughput of the method is high, since 96 reactions can be 
analyzed in 2 hr. 

Here, we applied this semi -automated procedure to determine 
the copy numbers of the 3 most frequently amplified genes in breast 
rumors (myc, ccndl and erbB2), as well as 2 genes (alb and app) 
located in a chromosome region in which no genetic changes have 
been observed in breast tumors. The results for 108 breast tumors 
were compared with previous Southem-blot data for the same 
samples. 



MATERIAL AND METHODS 
Tumor and blood samples 

Samples were obtained from 1 08 primary breast tumors removed 
surgically from patients at the Centre Rene Huguenin; none of the 
patients had undergone radiotherapy or chemotherapy. Immedi- 
ately after surgery, the rumor samples were placed in liquid 
nitrogen until extraction of high-molecular-weight DNA. Patients 
were included in this study if the tumor sample used for DNA 
preparation contained more than 60% of tumor cells (histological 
analysis). A blood sample was also taken from 18 of the same 
patients. 

DNA was extracted from tumor tissue and blood leukocytes 
according to standard methods. 

Real-time PCR 

Theoretical basis. Reactions are characterized by the point 
during cycling when amplification of the PCR product is first 
detected, rather than by the amount of PCR product accumulated 
after a fixed number of cycles. The higher the starting copy number 
of the genomic DNA target, the earlier a significant increase in 
fluorescence is observed. The parameter C, (threshold cycle) is 
defined as the fractional cycle number at which the fluorescence 
generated by cleavage of the probe passes a fixed threshold above 
baseline. The target gene copy number in unknown samples is 
quantified by measuring C, and by using a standard curve to 
determine the starting copy number. The precise amount of 
genomic DNA (based on optical density) and its quality (i.e., lack 



of extensive degradation) are both difficult to assess. We therefore 
also quantified a control gene (alb) mapping to chromosome region 
4qll-ql3, in which no genetic alterations have been found in 
breast-tumor DNA by means of CGH (Kallioniemi et aL, 1994). 

Thus, the ratio of the copy number of the target gene to the copy 
number of the alb gene normalizes the amount and quality of 
genomic DNA. The ratio defining the level of amplification is 
termed "N'\ and is determined as follows: 

copy number of target gene (app, myc, ccndl, erbB2) 
N — . 
copy number of reference gene (alb) 

Primers, probes, reference human genomic DNA and PCR 
consumables. Primers and probes were chosen with the assistance 
of the computer programs Oligo 4.0 (National Biosciences, Ply- 
mouth, MN), EuGene (Daniben Systems, Cincinnati, OH) and Primer 
Express (Perkin-Elmer Applied Biosystems, Foster City, CA). 

Primers were purchased from DNAgency (Malvern, PA) and 
probes from Perkin-Elmer Applied Biosystems. 

Nucleotide sequences for the oligonucleotide hybridization 
probes and primers are available on request. 

The TaqMan PCR Core reagent kit, Micro Amp optical tubes, 
and Micro Amp caps were from Perkin-Elmer Applied Biosystems. 

Standard-curve construction. The kinetic method requires a 
standard curve. The latter was constructed with serial dilutions of 
specific PCR products, according to Piatak et al. (1993). In 
practice, each specific PCR product was obtained by amplifying 20 
rig of a standard human genomic DNA (Boehringer, Mannheim, 
Germany) with the same primer pairs as those used later for 
real-time quantitative PCR. The 5 PCR products were purified 
using MicroSpin S-400 HR columns (Pharmacia, Uppsala, Swe- 
den) electrophorezed through an acrylamide gel and stained with 
ethidium bromide to check their quality. The PCR products were 
then quantified spectrophotometrically and pooled, and serially 
diluted 1 0-fold in mouse genomic DNA (Clontech. Palo Alto, CA) 
at a constant concentration of 2 ng/ul. The standard curve used for 
real-time quantitative PCR was based on serial dilutions of the pool 
of PCR products ranging from 10" 7 (10 5 copies of each gene) to 
10" 10 (10 2 copies). This series of diluted PCR products was 
aliquoted and stored at -80°C until use. 

The standard curve was validated by analyzing 2 known 
quantities of calibrator human genomic DNA (20 ng and 50 hg). 

PCR amplification. Amplification mixes (50 ul) contained the 
sample DNA (around 20 ng, around 6600 copies of disomic genes), 
10X TaqMan buffer (5 ul), 200 uM dATP, dCTP, dGTP, and 400 
uM dUTP, 5 mM MgCl 2 , 1.25 units of AmpliTaq Gold, 0.5 units of 
AmpErase uracil N-glycosylase (UNG), 200 nM each primer and 
100 nM probe. The thermal cycling conditions comprised 2 min at 
50°C and 10 min at 95°C. Thermal cycling consisted of 40 cycles at 
95°C for 15 s and 65°C for 1 min. Each assay included: a standard 
curve (from 10 5 to 10 2 copies) in duplicate, a no-template control, 
20 ng and 50 ng of calibrator human genomic DNA (Boehringer) in 
triplicate, and about 20 ng of unknown genomic DNA in triplicate 
(26 samples can thus be analyzed on a 96-well microplate). All 
samples with a coefficient of variation (CV) higher than 1 0% were 
retested. 

All reactions were performed in the ABI Prism 7700 Sequence 
Detection System (Perkin-Elmer Applied Biosystems), which 
detects the signal from the fluorogenic probe during PCR. 

Equipment for real-time detection. The 7700 system has a 
built-in thermal cycler and a laser directed via fiber optical cables 
to each of the 96 sample wells. A charge-coupled-device (CDD) 
camera collects the emission from each sample and the data are 
analyzed automatically. The software accompanying the 7700 
system calculates C t and determines the starting copy number in the 
samples. 
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Determination of gene amplification. Gene amplification was 
calculated as described above. Only samples with an N value 
higher than 2 were considered to be amplified. 

R£SULTS 

To validate the method, real-time PCR was performed on 
genomic DNA extracted from 108 primary breast tumors, and 18 
normal leukocyte DNA samples from some of the same patients. 
The target genes were the myc, ccndl and erbB2 proto-oncogenes, 
and the p-amyloid precursor protein gene {app\ which maps to a 
chromosome region (21q21.2) in which no genetic alterations have 
been found in breast tumors (Kallioniemi et ai, 1994). The 
reference disomic gene was the albumin gene {alb, chromosome 
4qll-ql3). 



Validation of the standard curve and dynamic range 
of real-time PCR 

The standard curve was constructed from PCR products serially 
diluted in genomic mouse DNA at a constant concentration of 
2 ng/ul. It should be noted that the 5 primer pairs chosen to analyze 
the 5 target genes do not amplify genomic mouse DNA (data not 
shown). Figure 1 shows the real-time PCR standard curve for the 
alb gene. The dynamic range was wide (at least 4 orders of 
magnitude), with samples containing as few as 10 2 copies or as 
many as 10 5 copies. 

Copy-number ratio of the 2 reference genes fapp and alty 

The app to alb copy-number ratio was determined in 1 8 normal 
leukocyte DNA samples and all 108 primary breast-tumor DNA 
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Figure 1 - Albumin (alb) gene dosage by real-time PCR. Top: Amplification plots for reactions with starting alb gene copy number ranging 
from 10 5 (A9), 10 4 (A7), 10^ (A4) to 10* (A2) and a no-template control (Al). Cycle number is plotted vs. change in normalized reporter signal 
(ARn). For each reaction tube, the fluorescence signal of the reporter dye (FAM) is divided by the fluorescence signal of the passive reference dye 
(ROX), to obtain a ratio defined as the normalized reporter signal (Rn). ARn represents the normalized reporter signal (Rn) minus the baseline 
signal established in the first 15 PCR cycles. ARn increases during PCR as alb PCR product copy number increases until the reaction reaches a 
plateau. C, (threshold cycle) represents the fractional cycle number at which a significant increase in Rn above a baseline signal (horizontal black 
line) can first be detected. Two replicate plots were performed for each standard sample, but the data for only one are shown here. Bottom: 
Standard curve plotting log starting copy number vs. C, (threshold cycle). The black dots represent the data for standard samples plotted in 
duplicate and the red dots the data for unknown genomic DNA samples plotted in triplicate. The standard curve shows 4 orders of linear dynamic 
range. 
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samples. We selected these 2 genes because they are located in 2 
chromosome regions (app, 2 1 q2 1 .2; alb, 4qll-ql3) in which no 
obvious genetic changes (including gains or losses) have been 
observed in breast cancers (Kallioniemi et ai, 1994). The ratio for 
the 18 normal leukocyte DNA samples fell between 0.7 and 1.3 
(mean 1.02 ± 0.21), and was similar for the 108 primary breast- 
tumor DNA samples (0.6 to 1.6, mean 1.06 ± 0.25), confirming 
that alb and app are appropriate reference disomic genes for 
breast-tumor DNA. The low range of the ratios also confirmed that 
the nucleotide sequences chosen for the primers and probes were 
not polymorphic, as mismatches of their primers or probes with the 
subject's DNA would have resulted in differential amplification. 

myc, ccnd 1 and zibB2 gene dose in normal leukocyte DNA 

To determine the cut-off point for gene amplification in breast- 
cancer tissue, 1 8 normal leukocyte DNA samples were tested for 
the gene dose (N), calculated as described in "Material and 
Methods". The N value of these samples ranged from 0.5 to 1.3 
(mean 0.84 ± 0.22) for myc; 0.7 to 1.6 (mean 1.06 ± 0.23) for 
ccnd J and 0.6 to 1 .3 (mean 0.9 1 ±0.19) for erbBl. Since N values 
for myc, ccnd] and erbBl in normal leukocyte DNA consistently 
fell between 0.5 and 1.6, values of 2 or more were considered to 
represent gene amplification in tumor DNA. 

myc, ccnd 1 and erbi?2 gene dose in breast-tumor DNA 

myc, ccnd J and erbB2 gene copy numbers in the 108 primary 
breast tumors are reported in Table 1. Extra copies of ccnd J were 
more frequent (23%, 25/108) than extra copies of erbBl (15%, 
16/108) and myc (10%, 11/108), and ranged from 2 to 18.6 for 
ccnd], 2 to 15.1 for erbBl, and only 2 to 4.6 for the myc gene. 
Figure 2 and Table II represent tumors in which the ccnd] gene was 
amplified 16-fold (T145), 6-fold (T133) and non-amplified (T118). 
The 3 genes were never found to be co-amplified in the same tumor. 
erbBl and ccnd] were co-amplified in only 3 cases, myc and ccnd] 
in 2 cases and myc and erbBl in 1 case. This favors the hypothesis 
that gene amplifications are independent events in breast cancer. 
Interestingly, 5 tumors showed a decrease of at least 50% in the 
erbBl copy number (N < 0.5), suggesting that they bore deletions 
of the 17q21 region (the site of erbBl). No such decrease in copy 
number was observed with the other 2 proto-oncogenes. 

. Comparison of gene dose determined by real-time quantitative 
PCR and Southern-blot analysis 

Southem-blot analysis of myc, ccnd I and erbBl amplifications 
had previously been done on the same 1 08 primary breast tumors. A 
perfect correlation between the results of real-time PCR and 
Southern blot was obtained for tumors with high copy numbers 
(N > 5). However, there were cases (1 myc, 6 ccnd] and 4 erbBl) 
in which real-time PCR showed gene amplification whereas 
Southern-blot did not, but these were mainly cases with low extra 
copy numbers (N from 2 to 2.9). 

DISCUSSION 

The clinical applications of gene amplification assays are 
currently limited, but would certainly increase if a simple, standard- 
ized and rapid method were perfected. Gene amplification status 
has been studied mainly by means of Southern blotting, but this 
method is not sensitive enough to detect low-level gene amplifica- 
tion nor accurate enough to quantify the full range of amplification 
values. Southern blotting is also time-consuming, uses radioactive 



TABLE 1 - DISTRIBUTION OF AMPLIFICATION LEVEL (N) FOR myc. 
ccndl AND crbB2 GENES fN 108 HUMAN BREAST TUMORS 



Gene 




Amplification level (N) 




<0.5 


0.5-1.9 2-4.9 


as 


myc 


0 


97 (89.8%) 11 (10.2%) 


0 


ccnd] 


0 


83 (76.9%) 17(15.7%) 


8 (7.4%) 


erbBl 


5 (4.6%) 


87 (80.6%) 8 (7.4%) 


8 (7.4%) 



reagents and requires relatively large amounts of high-quality 
genomic DNA, which means it cannot be used routinely in many 
laboratories. An amplification step is therefore required to deter- 
mine the copy number of a given target gene from minimal 
quantities of tumor DNA (small early-stage tumors, cytopuncture 
specimens or formalin-fixed, paraffin-embedded tissues). 

In this study, we validated a PCR method developed for the 
quantification of gene over-representation in rumors. The method, 
based on real-time analysis of PCR amplification, has several 
advantages over other PCR-based quantitative assays such as 
competitive quantitative PCR (Celi et ai, 1 994). First, the real-time 
PCR method is performed in a closed-tube system, avoiding the 
risk of contamination by amplified products. Re-amplification of 
carryover PCR products in subsequent experiments can also be 
prevented by using the enzyme uracil N-glycosylase (UNG) 
(Longo et ai, 1990). The second advantage is the simplicity and 
rapidity of sample analysis, since no post-PCR manipulations are 
required. Our results show that the automated method is reliable. 
We found it possible to determine, in triplicate, the number of 
copies of a target gene in more than 1 00 tumors per day. Third, the 
system has a linear dynamic range of at least 4 orders of magnitude, 
meaning that samples do not have to contain equal starting amounts 
of DNA. This technique should therefore be suitable for analyzing 
formalin-fixed, paraffin-embedded tissues. Fourth, and above all, 
real-time PCR makes DNA quantification much more precise and 
reproducible, since it is based on Q values rather than end-point 
measurement of the amount of accumulated PCR product. Indeed, 
the ABI Prism 7700 Sequence Detection System enables C x to be 
calculated when PCR amplification is still in the exponential phase 
and when none of the reaction components is rate-limiting. The 
within-run CV of the C t value for calibrator human DNA (5 
replicates) was always below 5%, and the between-assay precision 
in 5 different runs was always below 10% (data not shown). In 
addition, the use of a standard curve is not absolutely necessary, 
since the copy number can be determined simply by comparing the 
Q ratio of the target gene with that of reference genes. The results 
obtained by the 2 methods (with and without a standard curve) are 
similar in our experiments (data not shown). Moreover, unlike 
competitive quantitative PCR, real-time PCR does not require an 
internal control (the design and storage of internal controls and the 
validation of their amplification efficiency is laborious). 

The only potential disavantage of real-time PCR, like all other 
PCR-based methods and solid-matrix blotting techniques (South- 
ern blots and dot blots) is that is cannot avoid dilution artifacts 
inherent in the extraction of DNA from tumor cells contained in 
heterogeneous tissue specimens. Only FISH and immunohistochem- 
istry can measure alterations on a cell-by-cell basis (Pauletti et ai, 
1996; Slamon et ai. 1989). However, FISH requires expensive 
equipment and trained personnel and is also time-consuming. 
Moreover, FISH does not assess gene expression and therefore 
cannot detect cases in which the gene product is over-expressed in 
the absence of gene amplification, which will be possible in the 
future by real-time quantitative RT-PCR. Immunohistochemistry is 
subject to considerable variations in the hands of different teams, 
owing to alterations of target proteins during the procedure, the 
different primary antibodies and fixation methods used and the 
criteria used to define positive staining. 

The results of this study are in agreement with those reported in 
the literature. (0 Chromosome regions 4qll-ql3 and 21q21.2 
(which bear alb and app, respectively) showed no genetic alter- 
ations in the breast-cancer samples studied here, in keeping with 
the results of CGH (Kallioniemi et ai, 1994). (n) We found that 
amplifications of these 3 oncogenes were independent events, as 
reported by other teams (Bems et ai, 1992; Borg et ai, 1992). (iif) 
The frequency and degree of myc amplification in our breast tumor 
DNA series were lower than those of ccnd] and erbBl amplifica- 
tion, confirming the findings of Borg et ai ( 1 992) and Courjal et ai 
(1997). (/V) The maxima of ccnd] and erbBl over-representation 
were 1 8-fold and 15-fold, also in keeping with earlier results (about 
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Figure 2 - ccndJ and alb gene dosage by real-time PCR in 3 breast tumor samples: TU 8 (E 1 2, C6, black squares), Tl 33 (G 1 1 , B4, red squares) 
andT145 (A8, C8,blue squares). Given the C, of each sample, the initial copy number is inferred from the standard curve obtained during the same 
experiment. Triplicate plots were performed for each tumor sample, but the data for only one are shown here. The results are shown in Table II. 



30-fold maximum) (Bernse/fl/., 1992; Borg etai, 1992; Courjal et 
ai. 1997). (v) The erb&l copy numbers obtained with real-time 
PCR were in good agreement with data obtained with other 
quantitative PCR-based assays in terms of the frequency and 
degree of amplification (An et ai, 1 995; Deng et ai. 1996; Valeron 



et ai, 1996). Our results also correlate well with those recently 
published by Gelmini et ai ( 1 997), who used the TaqMan system to 
measure erbB2 amplification in a small series of breast tumors 
(n = 25), but with an instrument (LS-50B luminescence spectrom- 
eter, Perkin-Elmer Applied Biosystems) which only allows end- 



666 



B1ECHE ETAL. 



TABLE II - EXAMPLES OF ccndl GENE DOSAGE RESULTS 
FROM 3 BREAST TUMORS 1 



Tumor 




ccndl 






alb 




UccndJ/aib 


Copy 
number 


Mean 


SD 


Copy 
number 


Mean 


SD 


Til 8 


4525 






4223 










4605 


4603 


77 


4365 


4325 


89 


1.06 




4678 






4387 
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59821 






9787 










61659 


61100 


1111 


10092 


10137 


375 


6.03 




61821 






10533 








T145 


128563 






7321 










125892 


125392 


3448 


7762 


7672 


316 


16.34 




121722 






7933 









1 For each sample, 3 replicate experiments were performed and the mean 
and the standard deviation (SD) was determined. The level of ccndl gene 
amplification (NccndJ/alb) is determined by dividing the average ccndl 
copy number value by the average alb copy number value. 



point measurement of fluorescence intensity. Here we report myc 
and ccndl gene dosage in breast cancer by means of quantitative 
PCR. (vi) We found a high degree of concordance between 
real-time quantitative PCR and Southern blot analysis in terms of 
gene amplification, especially for samples with high copy numbers 
5-fold j. The slightly higher frequency of gene amplification 
(especially ccndl and erb&2) observed by means of real -time 
quantitative PCR as compared with Southem-blot analysis may be 
explained by the higher sensitivity of the former method. However, 
we cannot rule out the possibility that some tumors with a few extra 



gene copies observed in real-time PCR had additional copies of an 
arm or a whole chromosome (trisomy, tetrasomy or polysomy) 
rather than true gene amplification. These 2 types of genetic 
alteration (polysomy and gene amplification) could be easily 
distinguished in the future by using an additional probe located on 
the same chromosome arm, but some distance from the target gene. 
It is noteworthy that high gene copy numbers have the greatest 
prognostic significance in breast carcinoma (Borg et al, 1992; 
Slamon e/ a/., 1987). 

Finally, this technique can be applied to the detection of gene 
deletion as well as gene amplification. Indeed, we found a 
decreased copy number of erbB2 (but not of the other 2 proto- 
oncogenes) in several tumors; erbB2 is located in a chromosome 
region (17q21) reported to contain both deletions and amplifica- 
tions in breast cancer (Bieche and Lidereau, 1995). 

In conclusion, gene amplification in various cancers can be used 
as a marker of pre-neoplasia, also for early diagnosis of cancer, 
staging, prognostication and choice of treatment. Southern blotting 
is not sufficiently sensitive, and FISH is lengthy and complex. 
Real-time quantitative PCR overcomes both these limitations, and 
is a sensitive and accurate method of analyzing large numbers of 
samples in a short time. It should find a place in routine clinical 
gene dosage. 
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ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway have been linked to tumorigenesis in familial and 
sporadic colon carcinomas. Here we report the identification 
of two genes, WISP-1 and WISP-2, that are up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-1, but not by Wnt-4. Together with a third related gene, 
WISPS, these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-1. These included (/) C57MG cells infected with a Wnt-1 
retroviral vector or expressing Wnt-1 under the control of a 
tetracyline repressible promoter, and (/V) Wnt-1 transgenic 
mice. The WISP- 1 gene was localized to human chromosome 
8q24.1-8q24.3. WISP-1 genomic DNA was amplified in colon 
cancer cell lines and in human colon tumors and its RNA 
ove rex pressed (2- to >30-fold) in 84% of the tumors examined 
compared with patient-matched normal mucosa. WISPS 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to >40-fold) in 63% of the colon tumors analyzed. 
In contrast, WISP-2 mapped to human chromosome 20ql2- 
20ql3 and its DNA was amplified, but RNA expression was 
reduced (2- to > 30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-1 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role in colon tumorigenesis. 



Wnt-1 is a member of an expanding family of cysteine-rich, 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, cell polarity, and the establishment of cell fates (1, 
2). Wnt-1 originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-1 is not 
expressed in the normal mammary gland, expression of Wnt-1 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsh) to the cell membrane (1, 2, 6). Dsh then inhibits the 
kinase activity of the normally constitutively active glycogen 
synthase kinase-3j3 (GSK-3/3) resulting in an increase in 
p-catenin levels. Stabilized j3-catenin interacts with the tran- 
scription factor TCF/Lefl, forming a complex that appears in 
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the nucleus and binds TCF/Lefl target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
/3-catenin levels (9). APC is phosphorylated by GSK-3j3, binds 
to j3-catenin, and facilitates its degradation. Mutations in 
either APC or jS-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized. Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, Xnr3, a member of 
the transforming growth factor (TGF)-/3 superfamily, and the 
homeobox genes, engrailed, goosecoid, twin (Xtwn), and siamois 
(2). A recent report also identifies c-myc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-1 retrovirus. Over- 
expression of Wnt-1 in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 
tially expressed between these two cell lines might contribute 
to the transformed phenotype. 

In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-1 transformed cells, WISP-1 
and WISP-2, and a third related gene, WISPS. The MSP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), Cyr61, and 
nov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR-Select cDNA 
Subtraction Kit (CLONTECH). Tester double-stranded 

Abbreviations: TGF, transforming growth factor; CTGF, connective 
tissue growth factor; SSH, suppression subtractive hybridization; 
VWC, von Willebrand factor type C module. 
Data deposition: The sequences reported in this paper have been 
deposited in the Genbank database (accession nos. AF 100777, 
AF100778, AF100779, AF100780, and AF100781). 
tTo whom reprint requests should be addressed, e-mail: diane@gene. 
com. 
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cDNA was synthesized from 2 ixg of poly(A) + RNA isolated 
from the C57MG/Wnt-1 cell line and driver cDNA from 2 jug 
of poly(A) + RNA from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA Library Screening. Clones encoding full-length 
mouse WlSP-1 were isolated by screening a AgtlO mouse 
embryo cDNA library (CLONTECH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WISP-1 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
coding full-length mouse and human WlSP-2 were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding WlSP-3 were cloned from human 
bone marrow and fetal kidney libraries. 

Expression of Human WISP RNA. PCR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 yM of each dNTP at 
94°C for 1 sec, 62°C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and glyceraldehyde-3-phosphate dehydrogenase primer 
sequences are available on request. 

In Situ Hybridization. 33 P-labeled sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse WISP-1 or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WISP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-amplified, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CO-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by using Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization in 7 M GuSCN followed by centrifugation 
over CsCl cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISPs and c-myc in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR. Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantkate the genes. The 
relative gene copy number was derived by using the formula 
2 (Act > where ACt represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA. The 
d-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The W/SP-specific signal was 
normalized to that of the glyceraldehyde-3-phosphate dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-Elmer Applied Biosystems. 

RESULTS 

Isolation of WISP-1 and WISP-2 by SSH. To identify Wnt- 
l-inducible genes, we used the technique of SSH using the 



mouse mammary epithelial cell line C57MG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or homo- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/Wnt-1 cells. 

Two of the cDNAs, WISP-1 and WISP-2, were differentially 
expressed, being induced in the C57MG/Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. 1 A and B). Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on /3-catenin levels (13, 14). Expression of WISP-1 was 
up-regulated approximately 3-fold in the C57MG/Wnt-1 cell 
line and WISP-2 by approximately 5-fold by both Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-1 mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISPs may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-1 were isolated and the 
sequence compared with mouse WISP-1. The cDNA sequences 
of mouse and human WISP-1 were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of ~40,000 (M T 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and are 84% identical (Fig. 2A). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of ~27,000 (M r 27 K) (Fig. 2B). Mouse and human 
WISP-2 are 73% identical. Human WISP-2 has no potential 
N-linked glycosylation sites, and mouse WISP-2 has one at 

C57MG 



Parent Wnt-1 Wnt-4 




Fig. 1 . WISP-1 and WISP-2 are induced by Wnt-1 , but not Wnt-4, 
expression in C57MG cells. Northern analysis of WISP-1 (A) and 
WISP-2 (B) expression in C57MG, C57MG/Wnt-1, and C57MG/ 
Wnt-4 cells. Poly(A) + RNA (2 /ig) was subjected to Northern blot 
analysis and hybridized with a 70-bp mouse WISP- /-specific probe 
(amino acids 278-300) or a 190-bp WIS P-2-specific probe (nucleotides 
1438-1627) in the 3' untranslated region. Blots were rehybridized with 
human /3-actin probe. 
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Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WISP- 1 (A) and mouse and human WISP-2 (B). The potential 
signal sequence, insulin-like growth factor-binding protein (IGF-BP), 
VWC, thrombospondin (TSP), and C-terminal (CT) domains are 
underlined. 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among the 38 cysteines found in WISP-L 

Identification of WISPS. To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
W1SP-1 protein sequence and identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A full-length human 
WlSP-3 cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354-aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linked glycosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity with WISP-3 (Fig. 14). 

WISPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISP-3 are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP-2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors. This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogenic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced by TGF-/3 (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to Wnt-1. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix. 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 3£) (21). The N-terminal domain, which includes the first 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor (IGF)- 
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Fig. 3. (^4) Encoded amino acid sequence alignment of human 
WISPs. The cysteine residues of WISP-1 and WISP-2 that are not 
present in WISP-3 are indicated with a dot. (B) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot. (C) Expression of 
WISP mRNA in human tissues. PCR was performed on human 
multiple-tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von Wil- 
lebrand factor type C module (VWC), also found in certain 
collagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3v4 and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described to date but is 
absent in WISP-2 (Fig. 3v4 and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown). 

Expression of WISP mRNA in Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
WISP-1 expression was seen in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected in the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus. WISP-2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISPS was seen in adult 
kidney and testis and fetal kidney. Lower levels of WISP-3 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization of WISP-1 and' WISP-2. Expression of 
WISP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WISP-1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D), However, low- 
level WISP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast. Like WISP-1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from Wnt-1 transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 




Fig. 4. (A, C, £, and G) Representative hematoxyltn/eosin-stained 
images from breast tumors in Wnt-1 transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown in B and 
D. The tumor is a moderately well-differentiated adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and B)> 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). At higher magnification, expression is seen 
in the stromal(s) fibroblasts (C and D), and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas. Images of WISP-2 expression are shown in E-H. At low 
power (E and F), expression of WISP-2 is seen in cells lying within the 
fibrovascular tumor stroma. At higher magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (G and H). 



the predominant cell type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Genes. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels, WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lod) score 16.31] on chromosome 8q24,l to 8q24.3, in the 
same region as the human locus of the novH family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHGC-33922 (lod = 1,000) on 
chromosome 20ql2-20ql3.1. Human WISPS mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). WISPS is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
MYB (27, 29). 

Amplification and Aberrant Expression of WIS Ps in Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon cancer cell lines was 
assessed by quantitative PCR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fold) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-myc, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
for WISP-2 in 92% of the tumors (P < 0.001 for each). The 
copy number for WISPS was indistinguishable from one (P = 
0.166). In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 (P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa were 
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Fig. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. (A) Amplification in cell line DNA was determined by quanti- 
tative PCR. (B) Southern blots containing genomic DNA (10 jag) 
digested with EcoRI (WISP-1) or Xbal (c-myc) were hybridized with 
a 100-bp human WISP-1 probe (amino acids 186-219) or a human 
c-myc probe (located at bp 1901-2000). The WISP and myc genes are 
detected in normal human genomic DNA after a longer film exposure. 
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Fig. 6. Genomic amplification of WISP genes in human colon 
tumors. The relative gene copy number of the WISP genes in 25 
adenocarcinomas was assayed by quantitative PCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means ± SEM from one experiment done in 
triplicate. The experiment was repeated at least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-fold) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10-fold overexpression. 
In contrast, in 79% (15/19) of the tumors examined, WISP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP-1, W1SP-3 RNA was overexpressed in 
63% (12/19) of the colon tumors compared with the normal 



0,1 
10 





n 




111 filial! 


WISP-1 


! ■ ||J ¥ 



DC 
c 
o 



< 

2 

e 

a. 



o 

100 



i , WISP-2 

m 








WISP-3 

ll. .1 





fl2 39 €4 67 16 93 163 30 t20 1 B* 146 ( 8* 2 1 0 21 2 200 17 30 215 76 
M B1 82 B2 B2 32 S2 B2 82 S2 82 B2 C1 C2 C2 C2 0 DO O 

Patient #/Dukes Stage 

Fig. 7. WISP RNA expression in primary human colon tumors 
relative to expression in normal mucosa from the same patient. 
Expression of WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least twice. 



mucosa. The amount of overexpression of WISPS ranged from 
4- to >40-fold. 

DISCUSSION 

One approach to understanding the molecular basis of cancer 
is to identify differences in gene expression between cancer 
cells and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-1. 

Three of the genes isolated, WISP-1, WISP-2, and WISP-3, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyr61, and nov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-1. 
The first was C57MG cells infected with a Wnt-1 retroviral 
vector or C57MG cells expressing Wnt-1 under the control of 
a tetracyline-repressible promoter, and the second was in 
Wnt-1 transgenic mice, where breast tissue expresses Wnt-1, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt-1 and WISPs in that in these two situations, 
WISP induction was correlated with Wnt-1 expression. 

It is not clear whether the WISPs are directly or indirectly 
induced by the downstream components of the Wnt-1 signaling 
pathway (i.e., /3-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l-transformed cells, hours 
or days after Wnt-1 transformation. Thus, WISP expression 
could result from Wnt-1 signaling directly through /3-catenin 
transcription factor regulation or alternatively through Wnt-1 
signaling turning on a transcription factor, which in turn 
regulates WISPs. 

The WISPs define an additional subfamily of the CCN family 
of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr61, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-/3, platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin a v fc serves as 
an adhesion receptor for Cyr61 (33). 

The strong expression of WISP-1 and WISP-2 in cells lying 
within the fibrovascular tumor stroma in breast tumors from 
Wnt-1 transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar to that controlling connective 
tissue formation during wound repair. It has been proposed 
that mammary tumor cells or inflammatory cells at the tumor 
interstitial interface secrete TGF-/31, which is the stimulus for 
stromal proliferation (34). TGF-/31 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed in the stromal cells that surrounded the tumor cells 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply WISP-1 and 
WISP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-1 gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISPS RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified in the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression in normal 
colonic mucosa from the same patient. The gene for human 
WISP-2 was localized to chromosome 20ql2-20ql3, at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers, suggest 
ing the existence of one or more oncogenes at this locus 
(36-38). Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene in this 
amplicon. 

A recent manuscript on rCop-1, the rat orthologue of 
WISP-2, describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis coli and /3-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic j3-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISPs. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISPs in human 
colon tumors may indicate an important role for these genes 
in tumor development. 
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ABSTRACT The consistent cytogenetic translocation of 
chronic myelogenous leukemia (the Philadelphia chromosome, 
Ph 1 ) has been observed in cells of multiple hematopoietic 
lineages. This translocation creates a chimeric gene composed 
of breakpoint-cluster-region (bcr) sequences from chromosome 
22 fused to a portion of the abl oncogene on chromosome 9. The 
resulting gene product (P210 c ** w ) resembles the transforming 
protein of the Abelson murine leukemia virus In its structure 
and tyrosine kinase activity. P210 Cftbl is expressed in Ph 1 - 
positive cell lines of myeloid lineage and in clinical specimens 
with myeloid predominance. We show here that Epstein-Barr 
virus-transformed B-lymphocyte lines that retain Ph 1 can 
express P210 c " abl . The level of expression in these B-cell lines is 
generally lower and more variable than that observed for 
myeloid lines. Protein expression is not related to amplification 
of the abl gene but to variation in the level of bcr-abl mRNA 
produced from a single Ph 1 template. 



Chronic myelogenous leukemia (CML) is a disease of the 
pluripotent stem cell (1). In greater than 95% of patients, the 
leukemic cells contain the cytogenetic marker known as the 
Philadelphia chromosome, or Ph 1 (2). This reciprocal 
translocation event between the long arms of chromosomes 
9 and 22 has been used as a disease-specific marker for 
diagnosis and evaluation of therapy. Multiple hematopoietic 
lineages, including myeloid and B-lymphoid, contain Ph 1 in 
early or chronic phase, as well as in the more acute accel- 
erated and blast crisis phases of the disease. 

One molecular consequence of Ph 1 is the translocation of 
the chromosomal arm containing the c-abl gene on chromo- 
some 9 into the middle of the breakpoint-cluster region (bcr) 
gene on chromosome 22 (3-6). Although the precise 
translocation breakpoints are variable, an RNA-splicing 
mechanism generates a very similar 8-kilobase (kb) mRNA in 
each case (5-9). The hybrid bcr-abl message encodes a 
structurally altered form of the abl oncogene product, called 
P210 c abl (10-13), with an ammo-terminal segment derived 
from a portion of the exons of bcr on chromosome 22 and a 
carboxyl-terminal segment derived from a major portion of 
the exons of the c-abl gene on chromosome 9. The chimeric 
structure of bcr-abl and the resulting P210 c abl is similar to the 
structure of the Abelson murine leukemia virus gag-abl 
genome and resulting P160 v abl transforming gene product. 
Both proteins have very similar tyrosine kinase activities (10, 
11, 14) which can be distinguished by their relative stability 
to denaturing detergents and by their ATP requirements from 
the recently described tyrosine kinase activity of the c-abl 
gene product (15). 
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In concert with structural modification of the amino- 
terminal portion df the abl gene , increased level of expression 
has been implicated in activation of c-abl oncogenic poten- 
tial. Myeloid and erythroid cell lines and clinical samples 
derived from acute-phase CML patients contain about 10- 
fold higher levels of the 8-kb bcr-abl mRNA and P210 cabl than 
the c-abl mRNA forms (6 and 7 kb) and P145 c abl gene product 
(5, 8, 9, 11). The higher level of expression of the chimeric 
bcr-abl message in acute-phase cells is not likely to be solely 
due to the presence of the bcr promoter sequences at the 5' 
end of the gene, since the normal 4.5-kb and 6.7-kb bcr- 
encoded mRNA species are expressed at an even lower level 
than the normal c-abl messages (5, 6). 

We have analyzed a series of Epstein-Barr virus-immor- 
talized B-lymphoid cell lines derived from CML patients (16). 
With such in vitro clonal cell lines, we can evaluate whether 
the presence of Ph 1 always results in synthesis of the chimeric 
bcr-abl message and protein, and whether the quantitative 
expression varies for cells of B-lymphoid lineage as com- 
pared to previously examined myeloid cell lines. Our results 
show that cell lines that retain Ph 1 do express bcr-abl message 
and protein, but that the level is generally lower and more 
variable than previously seen for myeloid cell lines. The 
demonstration that the Ph 1 chromosomal template can vary 
in its level of expression of P210 c abl suggests that secondary 
mechanisms, beyond the translocation itself, contribute to 
the regulation of the bcr-abl gene in different cell types or 
subclones that derive from the affected stem cell. 

MATERIALS AND METHODS 

Cells and Cell Labelings. Epstein-Barr virus-transformed 
B-lymphoid cell lines were established from peripheral blood 
samples of chronic- and acute-phase CML patients as report- 
ed (16). The cell lines are designated according to patient 
number, karyotype, and lineage. For example, SK- 
CML7Bt(9,22)-33 refers to CML patient 7, B-lymphoid cell 
line, 9;22 translocation (Ph 1 ), cell line 33; and SK-CML7BN- 
2 refers to B-cell line 2 with a normal karyotype derived from 
the same patient. Repeat karyotype analysis was performed 
to verify the retention of Ph 1 just prior to analysis for abl 
protein and RNA. Cells were maintained in RPMI 1640 
medium with 20% fetal bovine serum. We have not observed 
any consistent pattern of in vitro growth rate that correlates 
to the stage of disease at the time of transformation with 
Epstein-Barr virus. Cells (1.5 x 10 7 ) were washed twice with 
Dulbecco's modified Eagle's medium lacking phosphate and 
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supplemented with 5% dialyzed fetal bovine serum. Cells 
were then resuspended in 2 ml of the minimal medium. 
Labeling was started with the addition of [ 32 P]orthophos- 
phate (1 mCi/ml; ICN; 1 Ci =? 37 GBq) and continued at 37°C 
for 3-4 hr. 

Immunoprecipitation and Immunoblottirig. Immunoprecip- 
itations were carried put as described (10). Cells (1.5 x 10 7 ) 
were washed with phosphate-buffered saline and extracted 
with 3-5 ml of phosphate lysis buffer (1% Triton X-100/0.1 
NaDodSO 4 /0.5% deoxycholate/10 mM Na 2 HP0 4 , pH 7.5/ 
100 mM NaCJ) with 5 mM EDTA and 5 mM phenyimethyl- 
sulfonyl fluoride. Extracts were clarified by centrifugation 
and precipitated with normal or rabbit anti-abl sera (anti- 
pEX-2 or anti-pEX-5) (17), The precipitated proteins were 
electrophoresed in a NaDodS0 4 /8% polyacrylamide gel. 
32 P-labeled proteins were detected by autoradiography. 
Alternatively, abl proteins were detected by immunoblotting. 
Extracts from unlabeled cells were clarified, and proteins 
were concentrated by immunoprecipitation with rabbit anti- 
sera against aW-encoded proteins [anti-pEX-2 and anti-pEX- 
5 combined (17)] and then fractionated in 8% acrylamide gels. 
The proteins were transferred from the gel to nitrocellulose 
filters, using protease-facilitated transfer (18). The abU 
encoded proteins were detected using murine monoclonal 
antibodies as a probe and peroxidase-coryugated goat anti- 
mouse second stage antibody (Bio-Rad) for development. 
Rabbit antisera and mouse monoclonal antibodies to abl 
proteins were prepared using bacterially expressed regions of 
the y-abl protein as immunogens (17, 19). Anti-pEX^2 anti- 
bodies react with the internal tyrosine kinase domain and 
anti-pEX-5 antibodies react with the carboxyl-terminal seg- 
ment of the abl proteins. 

RNA Analysis. RNA was extracted from 10 8 cells by the 
NaDodS0 4 /urea/phenol method (20). Polyadenylylated 
RNA was purified by oligo(dT) affinity chromatography. 
Samples were electrophoresed in a 1% agarose/formalde- 
hyde gel and transferred to nitrocellulose, abl RNA species 
were detected by hybridization with a nick-translated v-abl 
fragment probe (21). 

DNA Analysis. DNA was prepared from 5 x 10 7 cells of 
each cell line and processed for Southern blots with a v-abl 
probe as described (21). 

RESULTS 

Variable Levels of P210 c - bi Are Detected in Ph'-Positive Cell 
Lines. Ph^positive and Ph^negative, Epstein-Bait virus- 
transformed B-lymphocyte cell lines derived from the same 
patient were examined for P210 Mibi synthesis by immuno- 
precipitation of [ 32 P]orthophosphate-labeled cell extracts 
with anti-abl sera (Fig. 1). The normal z-abl protein P145 c ' abI 
was detected at a similar level in multiple Ph^positive and 
Pr^-negative cell lines. YlVf^ was only detected in the 
Ph^positive cell lines because the bcr-abl chimeric gene 
which encodes P210 c - bl resides on the Ph* (4, 5, 11, 13). The 
level of P210°* bl was about 4- to 5-fold higher than the level 
of p 145 c-«w ^ ^ SK-CML7Bt-33 cell line (Fig. 1A, +). The 
Ph^-positive eiythroid-progenitor cell line K562 (C) showed 
a level pf P210 c -» bl about 10-fold higher than P145 <>aW . 
However, the level of P210 c_abl was about one-fifth that of 
P145 c-^i in ^ phLpositive SK-CML16Bt-l cell line (Fig. IB, 
+). Comparison of different autoradiographic exposures 
roughly indicated that the level of P210^ varies over a 
20-fold range between these Ph 1 -positive B-cell lines. Anal- 
ysis of four additional Ph l -positive B-cell lines demonstrated 
that the level of P210 c * abl fell into two general classes; some 
cell lines had a level of P21(T Bb ^ similar to SK-CML7Bt-33 
and others had the low level similar to SK-CML16BM (Table 
1). This differs from previous studies with Ph^positive 
myeloid pell lines and patient samples derived from acute- 



B 




Fig. 1. Detection of variable levels of P210 c ** bl in Prepositive 
B-cell lines. Production of PHS^ 1 and P210 c "* bl in Epstein-Barr 
virus-transformed B-cell lines derived from a blast-crisis (A) and a 
chronic-phase (B) CML patient was examined by metabolic labeling 
with [ 32 P]orthophosphate and immunoprecipitation. Ph'-negative 
(-) and Ph'-positive (+) cell lines derived from each patient were 
analyzed. The Ph x -negative cell line in A,- is SK-CML7BN-2 and in 
B t ~ is SK-CML16BN-1. The Ph l -positive cell line in A,+ is 
SK-CML7Bt-33 and in B t + is SK-CML16BM. The K562 ceU line, a 
Prepositive erythroid progenitor cell line spontaneously derived 
from a blast-crisis patient (33), is represented in C. Cells (1.5 x 10 7 ) 
were metabolically labeled with 2 mCi of [ 32 P]orthophosphate for 3-4 
hr and then were extracted and clarified by centrifugation. Samples 
were immunoprecipitated with control normal serum (lanes 1), 
anti-pEX-2 (lanes 2), or anti-pEX-5 (lanes 3) and analyzed by 
NaDodS0 4 /8% PAGE followed by autoradiography with an inten- 
sifying screen (3 diys for A and C, 10 days for B). 

phase CML patients, in which P210 c * abl was detected at a 
10-fold higher level than P145 c " abl (refs. 10 and 11; Table 1). 
There was no large difference in level of chimeric mRNA and 
P2i0c-abi expressed in four myeloid/erythroid-lineage Pre- 
positive cell lines (K562, EM2, EM3, CML22, and BV173; 
refs. 9 and 11), despite a 4- to 5-fold amplification of 
a£/-related sequences in the K562 cell line. 

Detection of different levels of P210 c * abl in Fig. 1 could be 
due to decreased phosphorylation of P210 c ' abl , a lower level 
of V2KF- M synthesis, or altered stability of the protein. To 
help distinguish among these possibilities, the steady-state 
level of P210 C Bbl in the cell lines was assayed by immuno- 
blotting. The results show that SK-CML7Bt-33 (Fig. 2A, +) 
had a higher level of PZIO^ 1 than P145, similar to the results 
with metabolic labeling (Fig. 1). We did not detect P210 c abl 
by immunoblotting with 2 x 10 7 cells of line SK-CML8Bt-3 
(Fig. 2B> +). Reconstruction experiments using dilutions of 
cell extracts showed that we could detect about 5-10% the 
level of P210 c "* bl expressed in the K562 cell line (data not 
shown). We infer that the steady-state level of P210 c abl in 
SK-CML8Bt-3 is lower than the level in SK-CML7Bt-33 by 
a factor of at least 10. The level of P210 c abl detected in these 
assays correlated with the amount of P210 c_abl tyrosine kinase 
activity that could be detected in vitro (data not shown). 

Different Levels of F21(f*** Are Reflected in the Amount of 
Stable bcr-cbl mRNA. To identify the basis for detection of 
variable levels of P210 c " abI , we examined the production of 
the abl RNA. RNA blot hybridization analysis using a \-abl 
probe (Fig. 3) showed that the normal 6- and 7-kb c-abl 
mRNAs were present at a similar level in Ph 1 -positive and 
-negative cell lines derived from different patients. However, 
the 8-kb mRNA that encodes P210 c abl was detected at a 
10-fold higher level in SK-CML7Bt-33 (Fig. 3A, +) than in 
SK-CML16BM (£, +), which correlated with the relative 
level pf P210 Cftbl detected in each cell line. Analysis of 
additional cell lines demonstrated that the level of 8-kb RNA 
directly correlated with the level of P210 c ab ' (Table 1). The 
variation in level of 8-kb RNA detected in these cell lines was 
not due to loss or gain of Ph\ because cytogenetic analysis 
confirmed the presence of Ph 1 in these cell lines (ref. 16 and 
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Table 1. Relative levels of bcr-abl expression in Epstein-Barr 

virus-immortalized B-cell lines and myeloid CML lines 

8-kb 



C 
+ 



Cell line* 


CML phase ' 


r n + 






SK-CML7BN-2 


BC 


- 


- 


- 


SK-CML8BN-10 


Chronic 


- 


- 


- 


SK-CML8BN-12 


Chronic 


- 


- 


- 


SK-CML16BN-1 


Chronic 








SK-CML35BN-1 


Chronic 








SK-CML7B5-33 


BC 


+ 


+ + + 


+++ 


SK-CML21BM 


Acc 


+ 


+ + + 


++ + 


SK-CML21Bt-6 


Acc 


+ 


+ + + 


++ + 


SK-CML8Bt-3 


Chronic 


+ 


+ 




SK-CML16BM 


Chronic 


+ 


+ 




SK-CML35Bt-2 


Chronic 


+ 


+ 


+ 


K562 


BC 


+ 


+ + + + + 


+ + +++ 


BV173 


BC 


+ 


+ + + + + 


+ + +++ 


EM2 


BC 


+ 


+ + + + + 


+ + + + + 



*Cell lines derived from CML patients by transformation with 
Epstein-Barr virus as described (16). Names of cell lines indicate 
patient number and Ph 1 status: SK-CML7Bt indicates a cell line 
derived from patient 7 that carries the 9;22 Ph 1 translocation; N 
indicates a normal karyotype. Myeloid-erythroid cell lines (K562, 
EM2, and BV173) are described in previous publications (9, 11, 22, 
33). 

Status of patient at the time cell line was derived. BC, blast crisis; 
Acc, accelerated phase. 

^Presence (+) or absence (-) of Ph 1 as demonstrated by karyotypic 
or Southern blot analysis. 

§P210 C 4bl detected as described in legend to Fig. 1. B-cell lines 
derived from blast-crisis and accelerated-phase patients had levels 
of P210 3- to 5-fold higher (+ + +) than levels of P145. Chronic- 
phase-derived cell lines had P210 levels lower than orjust equivalent 
(+) to the level of P145. Myeloid and erythroid lines had levels of 
P210 5- to 10-fold higher than P145 (+ + + + +). * 
'Eight-kilobase bcr-abl mRNA detected as described in legend to 
Fig. 2. Symbols: ±, borderline detectable; + + + + + , level of 8-kb 
mRNA 5- to 10-fold higher than that of the 6- and 7-kb c-abl mRNA 
species; + + + , level of 8-kb mRNA 3- to 5-fold higher than that of 
the 6- and 7-kb species; + , a level approximately equivalent to that 
of the 6- and 7-kb messages. 

data not shown). There was no difference in the copy number 
of cW-related sequences as judged by Southern blot analysis 
(Fig. 4). Only the K562 cell line control showed an amplifi- 
cation of abl sequences, as previously reported (22, 23). 
These combined data suggest that differential bcr-abl mRNA 
expression from a single gene template is responsible for the 
variable levels of P210 c abl detected. This could be mediated 





— P210 



P145 



Fig. 2. Analysis of steady-state abl protein levels by immuno- 
blotting. Cell extracts prepared from 2 x 10 7 cells of lines SK- 
CML7BN-2 (A,-), SK-CML7Bt-33 (A,+), SK-CML8BN-10 (£,-), 
and SK-CML8Bt-3 (5,+) were concentrated by immunoprecip- 
itation with anti-pEX-2 plus anti-pEX-5. Samples were then electro- 
phoresed in a NaDodS0 4 /8% polyacrylamide gel and transferred to 
nitrocellulose, using protease-facilitated transfer (18). abl proteins 
were detected using a mixture of two monoclonal antibodies directed 
against the pEX-2 and pEX-5 a6/-protein fragments produced in 
bacteria (19) as a probe and a peroxidase-conjugated goat anti-mouse 
second-stage antibody (Bio-Rad) for development. 



kb 

-8 
-7 
-6 



Fig. 3. Comparison of abl RNA levels in Ph'-posiuve and 
-negative B-cell lines. The levels of the normal 6- and 7-kb c-abl 
RNAs and the 8-kb bcr-abl RNA were analyzed by blot hybridization 
using a v-abl probe. RNA was extracted from Ph^negative lines 
SK-CML7BN-2 (A,-) and SK-CML16BN-1 (£,-), from Ph l -pos- 
itive lines SK-CML6Bt-33 (A,+) and SK-CML16Bt-3 (£,+). and 
from line K562 (C,+) by the NaDodS0 4 /urea/phenol method (20). 
Polyadenylylated RNA was purified by oligo(dT) affinity chroma- 
tography, and 15 p% of each sample was electrophoresed in a 1% 
agarose/formaldehyde gel and then transferred to nitrocellulose. The 
blotted RNAs were hybridized with a nick-translated v-abl fragment 
probe (21) and then autoradiographed for 4 days. 



by factors influencing the transcription rate of the bcr-abl 
gene or the stability of the mRNA. 

DISCUSSION 

Several lines of evidence suggest that formation of Ph 1 is not 
the primary event that affects the stem cell in CML. Patients 
have been identified that present with the clinical picture of 
CML but only later develop Ph 1 (1). This observation, 
coupled with studies of G6PD (glucose-6-phosphate dehy- 
drogenase)-heterozygous females with CML that demon- 
strate stem-cell clonality by isozyme analysis among cell 



10 11 






Fig. 4. Southern blot analysis of abl sequences in Ph^positive 
and -negative B-cell lines. High molecular weight DNA (15 /ig) was 
digested with restriction endonuclease BamHl, separated in a 0.8% 
agarose gel, and then transferred to nitrocellulose. The blotted DNA 
fragments were hybridized with a nick-translated, 2.4-kb Bgl II v-abl 
fragment (1.5 x 10 8 cpm/jig; ref. 21) and exposed for 4 days. (A) 
Autoradiogram of aW-specific fragments in cell lines HL-60 (lane 1), 
EM2 Cane 2), K562 (lane 3). SK-CML7Bt-33 (lane 4), SK-CML8Bt-3 
(lane 5), SK-CML16BM (lane 6), SK-CML21Bt-6 (lane 7), SK- 
CML35Bt-2 (lane 8), SK-CML7BN-2 (lane 9), SK-CML8BN-2 (lane 
10), and SK-CML35BN-1 (lane 11). (B) Ethidium bromide staining of 
agarose gel prior to transfer to nitrocellulose, showing the level of 
variation in amount of DNA loaded per lane. 
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populations that lack the Ph 1 marker, supports a secondary 
or complementary role for Ph 1 in the progression of the 
disease (24, 25). This chromosome marker is found in 
chronic, accelerated, and blast-crisis phases of the disease. It 
is likely that Ph 1 confers some growth advantage, since cells 
with the marker chromosome eventually predominate the 
marrow and peripheral blood even in chronic phase. During 
the phase of blast crisis, many patients develop additional 
chromosome abnormalities, including duplication of Ph 1 , a 
variety of trisomies, and complex translocations (26). This 
is suggestive evidence for Ph 1 being a necessary but not 
sufficient genetic change for the full evolution of the 
disease. 

The realization that one molecular result of Ph 1 is the 
generation of a chimeric bcr-abl protein with functional 
characteristics and structure analogous to the gag-abl trans- 
forming protein of the Abelson murine leukemia virus 
strengthens the argument for an important role of Ph 1 in the 
pathogenesis of CML. Although the Abelson virus is gener- 
ally considered a rapidly transforming retrovirus, its effects 
can range from overcoming growth factor requirements, to 
cellular lethality, to induction of highly oncogenic tumors in 
a number of hematopoietic cell lineages (27, 28). Even in the 
transformation of murine cell targets, there are several lines 
of evidence that suggest that the growth-promoting activity of 
the v-abl gene product is complemented by further cellular 
changes in the production of the malignant-cell phenotype 
(29-31). 

The regulation of bcr-abl gene expression is complex 
because the 5' end of the gene is derived from the non-abl 
sequences, bcr, normally found on chromosome 22 (6). The 
level of stable message for the normal bcr gene and the 
normal abl gene are both much lower than the level of the 
bcr-abl message and protein from cell lines and clinical 
specimens derived from myeloid blast-crisis patients (5, 6, 
11). Therefore, the high level of bcr-abl expression cannot 
simply be attributed to the regulatory sequences associated 
with bcr. Possibly, creation of the chimeric gene disrupts the 
normal regulatory sequences and results in a higher level of 
expression. Variation in bcr-abl expression may result from 
secondary changes in the structure of the chimeric gene or 
function of fra/u-acting factors that occur during evolution of 
the disease. Our analysis of P210 c " abl and the 8-kb mRNA in 
Epstein-Barr virus-transformed Ph l -positive B-cell lines 
demonstrates that stable message and protein levels from the 
bcr-abl gene can vary over a wide range. This variation does 
not result from a change in the number of bcr-abl templates 
secondary to gene amplification but more likely from changes 
in either transcription rate or mRNA stability. We suspect 
this range of bcr-abl expression is not limited to lymphoid 
cells. Analysis of peripheral blood leukocytes derived from 
an unusual CML patient who has been in chronic phase with 
myeloid predominance for 16 years showed a level of 
P210 c abl one-fifth that of P145 c_abf , as detected by metabolic 
labeling with [ 32 P]orthophosphate and immunoprecipitation 
(S.C., O.N.W., and P. Greenberg, unpublished observa- 
tions). Lower levels of expression of the chimeric mRNA 
have b*een demonstrated in clinical samples from chronic- 
phase CML patients compared to acute-phase CML patients 
(9). Others have reported chronic-phase patients with vari- 
able but, in some cases, relatively high levels of the bcr-abl 
mRNA (32). The sampling variation and the heterogenous 
mixture of cell types in clinical samples complicate such 
analyses. Further work is needed to evaluate whether there 
is a defined change in P210 c abl expression during the pro- 
gression of CML. It is interesting to note that among the 
limited sample of Ph ! -positive B-cell lines we have examined 
(Table 1), we have seen higher levels of P210 c ab! in those 
derived from patk ^ at more advanced stages of the disease. 



It will be important to search for cell-type-specific mecha- 
nisms that might regulate expression of bcr-abl from Ph 1 . 
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1 Introduction 

A proteome has been defined as the'protein complement 
expressed by the. genome of an organism, or, In multicel- 
lular organisms, as the protein complement expressed by a 
tissue or differentiated cell [1J. In the most common im- 
plementation of proteome analysis the proteins extracted 
from the cell or tissue analyzed are separated by high 
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Proteome analysis: Biological assay or data archive? 

In this review we examine the current state of proteome analysis. There are 
three main issues discussed: why it is necessary to study proteomes; how pro- 
teomes can be analyzed with current technology; and how proteome analysis 
can be used to enhance biological research. We conclude that proteome anal- 
ysis is an essential tool in the understanding of regulated biological systems. 
Current technology, while stilt mostly limited to the more abundant pro loins, 
enables the use of proteome analysis both to establish databases of proteins 
present, and to perform biological assays involving measurement of multiple 
variables. We believe that the utility of proteome analysis, in future biological 
research will continue to be enhanced by further improvements In analytical 
technology. F 

resolution two-dimensional gel electrophoresis (2-DE), 
detected in the gel and identified by their amino acid 
sequence. The ease, sensitivity and speed with which get- 
separated proteins can be identified by the use of recently 
developed mass spectrometric techniques have dramati- 
cally increased the interest in proteome technology. One 
of the most attractive features of such analyses is that com- 
. plcx biological systems can potentially be studied in their 
entirety, rather than as a multitude of individual compo- 
nents. This makes it far easier to uncoVer the many com- 
plex, and often obscure, relationships between mature 
gene products in cells. Large-scale proteome characteriza- 
tion projects have been undertaken for a number of dif- 
ferent organisms and ceil types: Microbial proteome pro- 
jects currently in progress include, for example: Sdccharo- 
myces cerevfsiae [2], Salmonella entertca [3], Spireplasma 
rntllifcrum \A\ t Mycobacterium tuberculosis [5], Ochrobac- 
trum anthropi [6], Haemophilus Influenzae [7], Synccho- 
cyslis spp. [8J, Escherichia toll (9J, Rhizoblum legumlno- 
sarum {I0] r and Dictyostelium discoideum [11]. Proteome 
projects underway for tissues of more complex organ- 
isms include those for: human bladder squamous cell 
carcinomas [12], human liver [13], human plasma [13], 
human keratinocytes [12), human fibroblasts [12], mouse 
kidney [12], and rat serum [14]. In this manuscript we cri- 
tically assess the concept of proteome analysis and the 
technical feasibility of establishing complete proteome 
maps, and discuss ways in which proteome analysis and 
biological research , intersect. 



2 Rationale for proteome analysis 

The dramatic growth in both the number of genome 
projects and the speed with which genome sequences 
are being determined has generated huge amounts of 
sequence information, for some species even complete 
genomic sequences ([15—17]). The description of the 
state of a biological system by the quantitative measure- 
ment of system components has long been a primary 
objective in molecular biology. With recent technical 
advances including the development of differential dis- 
play-PCR [18], cDNA microarray and DNA chip techno- 
logy [19, 20] and serial analysis of gene expression 
(SAGE) [21, 22], it Is now feasible to establish global and 
quantitative mRNA expression maps of cells and tissues, 
in which the sequence of all the genes is known, at a 
speed and sensitivity which is not matched by current 
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protein analysis technology. Given the long-standing 
paradigm in biology that DNA synthesizes RNA which 
synthesizes protein, and the ability to rapidly establish 
global, quantitative mRNA expression maps, tho ques- 
tions which arise are why technically complex proteome 
projects should be undertaken and what specific types of 
information could be expected from proteome projects 
which cannot be obtained from genomic and transcript 
. profiling projects. We see three main reasons for pio- 
teom© analysis to become an essential component in the 
comprehensive analysis of biological systems, (i) Protein 
expression levels are not predictable from the mRNA 
expression levels, (it) proteins are dynamically modified 
and processed in ways which are not necessarily 
apparent from the gene sequence, and (iii) proleomes 
are dynamic and reflect the state of a biological system. 

2.1 Correlation bttwcea mRNA and protein expression 
levels 

Interpretations of quantitative mRNA expression profiles 
frequently implicitly or explicitly assume that for. specific 
genes the transcript levels are indicative of the levels of 
protein expression. As part of an ongoing study in our 
laboratory, we have determined the correlation of expres- 
sion at the mRNA and protein levels for a population of 
selected genes in the yeast Saccharomyces cerevlsiae 
growing at mid-log phase (S. P. Gygi et submitted for 
publication). mRNA expression levels were calculated 
from published SAGE frequency tables (22J. Protein 
expression levels were quantified by metabolic radiola- 
beling of the yeast proteins, liquid scintillation counting 
of the protein spots separated by high resolution 2-DB 
and mass spectroraetric identification of the protein(s) 
migrating to each spot. The selected 80 samples consti- 
tute a relatively homogeneous group with respect to pre- 
dicted half-life and expression level of the protein pro- 
ducts. Thus far, we have found a general trend but no 
strong correlation between protein and transcript levels 
(Fig. 1). For some genes studied equivalent mRNA trans- 
cript levels translated into protein abundances which 
varied by more than 50-fold. Similarly, equivalent steady- 
state protein expression levels were maintained by trans- 
cript levels varying by as much as 40-fold (S. P. Gygi 
et <j/., submitted). These results suggests that even for a 
population of genes predicted to be relatively homoge- 
neous with respect to protein half-life and gene expres- 
sion, the protein levels cannot be accurately predicted 
from the level of the corresponding mRNA transcript 

IX Proteins are dynamically modified sod processed 

In the mature, biologically actiye form many proteins are 
post-translationaliy modified by glycosylation, phosphor- 
ylation, prenyiation, acylation, ubiquitination or one or 
more of many other modifications [23] and many pro- 
teins are only functional if specifically associated or com- 
plexed with other molecules, including DNA, RNA, pro- 
teins and organic and inorganic cofactors. Frequently, 
modifications are dynamic and reversible and may alter 
the precise three-dimensional structure and the slate of 
activity of a protein. Collectively, the state of modifica- 
tion of the proteins which constitute a biological system 
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Figure L Correlation between mRNA and protein letelj to yeast cells. 
For a selected population of 80 genes, protein levtU were aetsured 
by "-S radiotabetEng and mRNA ttyels were tafcuhted from publi- 
shed SAGE UbleJ. Inset: expanded view of tho low abundance region. 
Pot more experimental deUUs, tAso ree Fig a. 5 and 6, (S. P. Gygi et al % 
submitted). 



are important indicators for the state of the system. The 
type of protein modification and the sites modified at a 
specific cellular state can usually not be determined 
from the gene sequence alone. 

23 Proteomes are dynamic and reflect the state of a 
biological system 

A single genome can give rise to many qualitatively and 
quantitatively different proteomes. Specific stages of the 
cell cycle and states of drrTereatiation, responses to 
growth and nutrient conditions, temperature and stress, 
and pathological conditions represent cellular states 
which are characterized by significantly 'different pro- 
teomes. The proteome, in principle, also reflects events 
that are under translatiooal and post-transiational con* 
trot It is therefore expected that proteomics will be able 
to provide the most precise and detailed molecular des- 
cription of the state of a cell or tissue, provided that the 
external conditions defining the state are carefully deter- 
mined. In answer to the question of whether the study 
of proteomes Is necessary for the analysis of biomolec- 
ular systems, it is evident that the analysis of mature pro- 
tein products in cells Is essential as there are numerous 
levels of control of protein synthesis; degradation,, 
processing and modification, which are only apparent by 
direct protein analysis. 



3 Description and assessment of current proteome 
analysis technology 

3.1 Technical requirements of proteome technology 

In biological systems the level of expression as well as 
the states of modification, processing and macro-molec- 
ular association of proteins are controlled and modu- 
lated depending on the state of the system. Comprehen- 
sive analysis of the identity, quantity and state of modifi- 
cation of proteins therefore requires the detection and 
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quantitation of the proteins which constitute the system, 
and analysis of differentially processed forms, There are 
a number of inherent difficulties in protein analysis 
which complicate these tasks. First, proteins cannot be 
amplified. It is possible to produce large amounts of a 
particular protein by over-expression in specific ceil sys- 
tems. However, since many proteins are dynamically 
post-translationally modified, they cannot be easily am- 
plified in the form in which they finally function in the 
biological system. It is frequently difficult to purify from 
the native source sufficient amounts of a protein for 
analysis. From a technological point of view this trans- 
lates into the need for high sensitivity analytical tech- 
niques. Second, many proteins are modified and pro- 

.cessed post-translationally. Therefore, in addition to the 
protein identity, the structural basis for differentially 
modified isoforms also needs to be determined. The dis- 
tribution of a constant amount of protein over several 
differentially modified isoforms further reduces the 
amount of each species, available for analysis. The com- 
plexity and dynamics of post-translational protein edit- 
ing thus significantly complicates proteome studies. 
Third, proteins vary dramatically with respect to their 
solubility in commonly used solvents. There are few, if 
any, solvent conditions in which all proteins are soluble 
and which axe also compatible with protein analysis. This 
makes the development of protein purification methods 
particularly difficult since both protein purification and 
solubility have to be achieved under the same condi- 
tions. Detergents, in particular sodium dodecyl sulfate 
(SDS), are frequently added to aqueous solvents to 
maintain protein solubility. The compatibility with SDS 
is a big advantage of SDS polyacrytamide gel electro- 
phoresis (SDS- PAGE) over other protein separation 

""techniques. Thus, SDS-PAGE and two-dimensional gel 
electrophoresis, which also uses SDS and other deter- 
gents, are the most general and preferred methods for 
the purification of small amounts of proteins, provided 
that activity does not necessarily need to be maintained. 
Lastly, the number of proteins in a given cell system is 
typically in the thousands. Any attempt to identify and 
categorize all of these must use methods which are as 
rapid as possible to allow completion of the project 

. within a reasonable time frame. Therefore, a successful, 
general proteomics technology requires high sensitivity, 
high throughput, the ability to differentiate differentially 
modified proteins, and the ability to quantitatively dis- 
play and analyze all the proteins present in a sample. 

3.2 2-D electrophoresis — mass spectrometry: a common 
Implementation of proteome analysis 

The most common currently used implementation of 
•proteome analysis technology is based on the separation 
of proteins by two-dimensional (IEF/SDS-PAGE) gel 
electrophoresis and their subsequent Identification and 
analysis by mass spectrometry (MS) or tandem mass 
spectrometry (MS/MS). In 2-DE, proteins are first separ- 
ated by isoelectric focusing (IEF) and then by SDS- 
PAGE, in the second, perpendicular dimension. Separ- 
ated proteins are visualized at high sensitivity by staining 
or autoradiography, producing two-dimensional arrays of 
proteins. 2-DE gels arc, at present, the most commonly 
used means of global display of proteins in complex 



samples. The separation of thousands of proteins has 
been achieved in a single gel 124. 25) and differentially 
modified proteins are frequently separated. Due to the 
compatibility of . 2-DE with high concentrations of deter- 
gents, protein denaturants and other additives promoting 
protein solubility, the technique is widely used. 

The second step of this type of proteome analysts is the 
identification and analysis of separated proteins. Individ- 
ual proteins from polyacrylamide gels have traditionally 
been identified using //-terminal sequencing [26, 27), 
internal peptide sequencing [28, 29), immunoblotting or 
comigration with known proteins [30]. The recent dra- 
matic growth of large-scale genomic and expressed 
sequence tag (EST) sequence databases has resulted ifta 
fundamental change in the way proteins are identified |y 
their amino acid sequence. Rather than by the traditional 
methods described above, protein sequences are now fre- 
quently determined . by correlating mass spectral or 
tandem mass spectral data of peptides derived from pro- 
teins, with the information contained in sequence data- 
bases [31—33J, 

There are a number of alternative approaches to pro- 
teome analysis currently under development. There is 
considerable interest in developing a proteome analysis 
stragegy which bypasses 2-DE altogether, because it is 
considered a relatively slow and tedious process, and 
because of perceived difficulties in extracting proteins 
from the gel matrix for analysis. However, 2-DE as a 
starting point for proteome analysis has many advan- 
tages compared to other techniques available today. The 
most significant strengths of the 2-DE-MS approach 
include the relatively uniform behavior of proteins in 
gels, the ability to quantify spots and the high resolution 
and simultaneous display of hundreds to thousands of 
proteins within a reasonable time frame. 

A schematic diagram of a typical procedure of the identi- 
fication of gel-separated proteins is shown in Fig. 2. Pro- 
tein spots detected in the gel are emymatfcatly or chemi- 
cally fragmented and the peptide fragments are Isolated 
for analysis, as already indicated, most frequently by MS 
or MS/MS, There are numerous protocols for the gener- 
ation of peptide fragments from gel-separated proteins. 
They can be grouped into two categories, digestion in 
the gel slice [28, 34] or digestion after electrotransfer out 
of the gel onto a suitable membrane (129, 35*37] and 
reviewed in [38]). la most instances either technique is 
applicable and yields good results. The analysis of MS or 
MS /MS data is an important step in the whole process 
because MS instruments can generate an enormous 
amount of information which cannot easily be managed 
manually. Recently, a number of groups have developed 
software systems dedicated to the use of peptide MS 
and MS/ MS spectra for the identification of proteins. 
Proteins are identified by correlating the information 
contained in Che MS spectra of protein digests or 
MS/MS spectra of individual peptides with data con- 
tained in DNA or protein sequence databases. 

The systems we are currently using in our laboratory are 
based on the separation of the peptides contained in pro- 
tein digests by narrow bore or capillary liquid chrornatog- 
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MS spectrum, peptide fragment muses from CID spectra of peptides, 
or a combimUon of both. 



raphy (39, 40] or capillary electrophoresis [41], the anal- 
ysts of the separated peptides by electrospray ioniza- 
tion (ESI) MS/MS, and the correlation of the generated 
peptide spectra with sequence databases using the 
SEQUBST program developed at the University of Wash- 
ington (32, 33], The system automatically performs the 
following operations: a particular peptide ion character- 
ized by its mass-to-charge ratio is selected in the MS out 
of all the peptide ions present in the system at a parti- 
cular time; the selected peptide ion is collided in a colli- 
sion cell with argon (collision-induced dissociation, 
CID) and the- masses of the resulting fragment ions are 
determined in the second sector of the tandem MS; this 
experimentally determined CID spectrum is then corre- 
lated with the CID spectra predicted from all the pep- 
tides in a sequence database which have essentially the 
same mass as the peptide selected for CID; this correla- 
tion matches the isolated peptide with a sequence seg- 
ment In a database and thus identtGes the protein from 
which the peptide was derived. There are a number of 
alternative programs which use peptide CID spectra for 
protein identification, but we use the SEQUEST system 
because it is currently the most highly automated pro- 
gram and has proven to be successful, versatile and 
robust. 

33 Protein Identification by LC-MS/MS, capillary 
LC-MS/MS mad CE-MS/MS 

It has'been demonstrated repeatedly that MS has a very 
high intrinsic sensitivity. For' the routine analysis of gel- 
separated proteins at high sensitivity the most signif- 
icant challenge is the handling of small amounts of 
sample. The crux of the problem is the extraction and 
transferal of peptide mixtures generated by the digestion 
of low nanogram amounts of protein, from gets into the 
MS/MS system without significant loss of sample or 
introduction of unwanted contaminants. We employ 
three different systems for introducing gel-purified sam- 
ples into an MS, depending on the level of sensitivity 



required. As an approximate guideline, for samples con* 
taining tens of picomotes of peptides, LC-MS/MS is 
most appropriate; for samples containing tow ptcomole 
amounts to high femtomole amounts we use capillary 
LC-MS/MS; and for samples containing femtomoles or 
less, CE-MS/MS is the method of choice. 

3.3 J LC-MS/MS 

The coupling of an MS to aa HPLC system using a 
OJ mm diameter or bigger reverse phase (RP) column 
has been described in detail [42]. This system has several 
advantages if a large number of samples are to be ana- 
lyzed and ail are available in sufficient quantity. The 
LC-MS and database searching program can be run in a 
fully automated mode using an autosampler, thus maxi- 
mizing sample throughput and minimizing the need for 
operator interference. The relatively large column is 
tolerant of high levels of impurities from either gel prep- 
aration or sample matrix. Lastly, if configured with a 
flow-splitter and micro-sprayer [40], analyses can be per- 
formed on a small fraction of the sample (less than 5%) 
while the remainder of the sample is recovered in very 
pure solvents. This latter feature is particularly useful 
when an orthogonal technique is also used to analyze 
peptide fractions, such as scintillation of an introduced 
radiolabel, and this data can be correlated with peptides 
identified by CID spectra. 

3J.2 Capillary LC-MS 

An increase of sensitivity of approximately tenfold can be 
achieved by using a caphTary LC system with a 100 ura ID 
column rather than a 0,5 mm ID column as referred to 
above. Since very low flow rates are required for such 
columns, most reports have used a precolumn flow split- 
ting system for producing solvent gradients. We have 
recently desribed the design and construction of a novel 
gradient mixing system which enables. the formation 
of reproducible gradients at very low flow rates (low 
nL/min) without the need for flow splitting (A. Ducret 
ct a/., submitted for publication). Using this capillary 
LC-MS/MS system we were able to identify gel-separat- 
ed proteins if low picomole to high femtomole amounts 
were loaded onto the gel [40J. This system is as yet not 
automated and, like all capillary LC systems, is prone to 
blockage of the columns by microparttculates when ana- 
lyzing get-separated proteins. 

3JJ CE-MS/MS 

The highest level of sensitivity for analyzing gel-sep- 
arated proteins can be achieved by using capillary elec- 
trophoresis - mass spectrometry (CE-MS). We have de- 
scribed in the past a solid-phase extraction capillary elec- 
trophoresis (SPE-CB) system which was used with triple 
quadrupole and ion trap ESI-MS/MS systems for the 
identification of proteins at the low femtomole to sub- 
femtomole sensitivity level (43, 44], While this system is 
highly sensitive, its operation Is labor-intensive and its 
operation has not been automated. In order to devise an 
analytical system with both the sensitivity of a and 
the level of automation of LC, we have constructed 
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Figart 3. Schematic Illustration of a 
mlcrofabricated analytical system for CE r 
consisting or a tnlcromachfned device, 
coated capillary tlcctro osmotic pump, 
and niicroelectrospray Interface. The 
dimensions of the channel; and reservoir 
are at indicated In tbe text/The chtanets - 
on the device were graphically enhanced 
to make them more visible. Reproduced 
from.[45|, with permission. 



rnicrofabricated devices for the iatroduclion of samples 
into ESI MS for high-sensitivity peptide analysis. 

The basic device is a piece of glass into which channels 
0 f 10-30 um in depth and 50-70 um in diameter axe 
etched by using photolithography/etching techniques 
similar to the ones used in the semiconductor industry. 
(A simple device is shown in Fig. 3). Tbe channels are 
connected to an external high voltage power supply [45J, 
Sampler are manipulated on the device and off the 
device lo tbe MS by applying different potentials to the 
reservoirs. This creates a solvent flow by eleclroosmotic 
pumping which can be redirected by changing the posi- 
tion of the electrode. Therefore, without the need for 
valves or gates and without any external pumping, the 
flow can be redirected by simply switching the position 
of the electrodes on the device. The direction and rate of 
the flow can be modulated by the size and the polarity 
of the electric field applied and also by the charge slate 
of the surface. 

The type of data generated by the system is illustrated in 
Fig. 4, which shows the mass spectrum of a peptide sample 
representing the tryptic digest of carbonic anhydrase at- 
290 fmol/uL. Bach numbered peak indicates a peptide suc- 
cessfully identified as being derived from carbonic an- 



hydrase. Some of the unassigned signals may be chemical 
or peptide contaminants. The MS is programmed to auto- 
matically select each peak and subject the peptide to CID. 
The resulting CID spectra are then used to identify the 
protein by correlation with sequence databases. Therefore, 
this system allows us to concurrently apply a number of 
protein digests onto the device, to sequentially mobilize 
the samples, to automatically generate CID spectra of 
selected peptide ions and to search sequence databases 
for protein identification. These steps are performed auto- 
matically without the need for user input and proteins can 
be identified at very low fern to mole level sensitivity at a 
rate of approximately one protein per 15 min. 

3.4 Assessment of 2-DB-MS proteome technology 

Using a combination of the analytical techniques de- 
scribed above we have identified the 80 protein spots 
indicated in Fig. 5. The protein pattern was generated by 
separating a total of 40 microgram of protein contained 
in a total cell lysate of the yeast strain YFH499 by high 
resolution 2-DE and silver staining of the separated pro- 
teins. To estimate how far this type of proteome analysis 
can penetrate towards tbe Identification of low abun- 
dance proteins, we have calculated the codon bias of the 
genes encoding the respective proteins. Codon bias is a 




Flpire 4. MS spectrum of a tryptic digest 
of carbonic anbydrase using the micro ft- 
bricated system shown (a Fig. 3. 290 
fmol/uL of carbonic anhydrase tryptic 
digest was infused Into a Pinalgan tCQ 
ion trap MS. Each peak was selected for . 
CfD, and thou which were Identified as 
containing peptides derived from car- 
bonic anhydrase arc numbered. Repro- 
duced from (451, with permission. * 
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calculated measure of the degree of redundancy of trip* 
let DMA codons used to produce each amino acid in a 
particular gene sequence. It has been shown to be a 
useful indicator of the level of the protein product of a 
particular gene sequence present in a cell {461 The gen- 
eral rale which applies is that the higher the value of the 
codon bias calculated for a gene, the more abundant the 
protein product of that gene becomes. The calculated 
codon bias values corresponding to the proteins identi- 
fied in Fig. 5 are shown in Fig. 6b. Nearly all of the pro- 
teins identified f> 95%) have codon bias values of > 0.2, 
indicating they are highly abundant in cells. In contrast, 
codon bias values calculated for the entire yeast genome 
(Fig. 6a) show that the majority of proteins present in 
the proteome have a codon bias of < 0.2 and are thus of 
low abundance. 

This finding is of considerable importance in our assess- 
ment of the current status of proteome analysis technol- 
ogy. II is clear that even using highly sensitive analytical 
techniques, we are only able to visualize and identity .the 



more abundant proteins. Since many important regula- 
tory proteins are present only at low abundance, these 
would not be amenable to analysis using such tech- 
niques. This situation would be exacerbated in the anal- 
ysis of proteomes containing many more proteins than 
the approximately 6000 gene products* present la yeast 
cells P6]. En the analysis of, for example, the proteome 
of any human cells; there are.poteatialiy 50000-100000 
gene products [47). Inherent limitations on the amount 
of protein that can be loaded on 2-DE, and the number 
of components that can be resolved, indicate that only 
the most highly' abundant fraction of the many gene 
products could be successfully analyzed. One approach 
that has been employed to circumvent these limitations 
is the use of very narrow range immobilized pH gradient 
strips for the first-dimension separation of 2-DE (48], 
Since only those proteins which focus within the narrow 
range will enter the second dimension of separation, a 
much higher sample loading within the desired range is 
possible. This, in turn, cad lead to the visualization and 
identification of less abundant proteins. 
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Codoo Bias 

/fcorc 6. Calculated codon bias Yatues for yeast proteins. (A) DUtribu- 
lion of calculated values for tbe entire jrt*rt proteotne. (B) Distribu- 
tion of calculated values for the subset of 10 identified protdas also 
shown ia Fi*s. 1 and 5. Further details of experimental procedures are 
included to. S. P. Gygi « «t (submitted) 



4 Utility of proteome analysis for biological 
research 

For the success of proteomics as a. mainstream approach 
to the analysts of biological systems it is essential to 
define how protcomc analysis and biological research 
projects intersect. Without a clear plan for the implemen- 
tation of proteome-type approaches into biological re- 
search projects the full impact of the technology can not 
be realized. The literature indicates that proteome anal- 
ysis is used both as a database/data archive, and as a bio- 
logical assay or biological research tool 

4.1 Tbe proteome as a database 

The use of proteomics as a database or data archive 
essentially entails an attempt to identify all the proteins 
la a cell or species and to annotate each protein with the 
known biological information that is relevant for each 
protein. The level of annotation can, of course, be exten- 
sive. The most common implementation of this idea is 
the separation of proteins. by high resolution 2-DE, the 
identification of each detected protein spot and the 
annotation of the protein spots in a 2-DE gel database 
format. This approach is complicated by the fact that it is 
difficult to precisely define a proteome and to decide 
which proteome should be represented in the database. 
In contrast to the genome of a species, which is essen- 
tially static, the proteome is highly dynamic. Processes 
such as differentiation, cell activation and disease can all 
significantly change the proteome of a species. This is 
illustrated in Fig. 7. The figure shows two high-resolu- 



tion 2-DB maps of proteins isolated from rat serum. 
Fig. 7A Is from the serum of normal rats, while Fig. 7B 
is from the serum of tats in acute-phase serum after 
prior treatment .with an mftariumau'on-caustug agent [49]. 
It is obvious that the protein patterns are significantly 
different in several areas, raising the question of exactly 
which proteome is being described. 

Therefore, a comprehensive proteome database of a spe- 
cies or cell type needs to contain all of the parameters 
which describe the state and the type of the cells from 
which the proteins were extracted as well as tbe software 
tools to search the database with queries which reflect 
the dynamics of biological systems. A comprehensive 
proteome database should be capable of quantitatively 
describing the fate of each protein if specific systenj 
and pathways are activated in the ceil. Specifically, thfc 
quantity, the degree of modification, the subcellular loca- 
tion and the nature of molecules specifically interacting 
with a protein as well as the rate of change of these 
variables should be described. Using these admittedly 
stringent criteria, there is currently no comlete proteome 
database. A number of such databases arc, however, in 
the process of being constructed. The most advanced 
among them, in our opinion, are the yeast protein data- 
base YPD 150] (accessible at http://www.ypdLcom) and 
the human 2D-PAGB databases of the Danish Centre 
for Human Genome Research (12] (accessible at http;// 
blobase.dsycgH>in/ceus). While neither can be con- 
sidered complete as not all of the potential gene pro- 
ducts are identified, both contain extensive annotation 
of supplemental information for many of the spots 
which are positively identified in reference samples. 

4.2 The proteome as a biological assay 

The use of proteome analysis as a biological assay or 
research tool represents an alternative approach to inte- 
grating biology with proteomics. To investigate the state 
of a system, samples are subjected to a specific proceess 
that allows the quantitative or qualitative measurement 
of some of the variables which describe the system. In 
typical biochemical assays one variable enzyme 
activity) of a single component (e,g., a particular en- 
zyme) is measured. Using proteomics as an assay, mul- 
tiple variables (e.£., expression level, rate of synthesis, 
phosphorylation state, etc.) are measured concurrently 
on many (ideally all) of the proteins in a sample. The 
use of proteomics as an assay is a less far-reaching prop- 
osition than the construction of a comprehensive pro- 
teome database. It does, however, represent a pragmatic 
approach which can be adapted to investigate specific 
systems and pathways, as long as the Interpretation of 
the results takes into account that with current technol- 
ogy not all of the variables which describe the system 
can be observed (see Section 3.4). 

A common implementation of proteome analysis as a 
biological assay is when a 2-DB protein pattern gener- . 
ated from the analysis of an experimental sample is 
compared to an array of reference patterns representing 
different states of the system, under investigation. The 
state of the experimental system at the time the sample 
was generated is therefore determined by the quantita- 
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tive comparative analysis of hundreds to a few thousand 
proteins. Comparative analysis of the 2-DE patterns fur- 
thermore highlights quantitative and qualitative differ- 
ences in the protein profiles which correlate with the 
state of the system. For this type of analysis it is not 
essential that all the proteins are identified or even visu- 
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alized, although the results become more informative as 
more proteins are compared. It is obvious, however, that 
the possibility to identify any protein deemed character- 
istic for a particular state dramatically enhances this 
approach by opening up new avenues for experimeata- 
tion. 




Figure t High resolution 2-DE map of protein* isolated from rat scrum with or without prior exposure lo an inflam- 
mation-causing agent. (A) normal rat serum, (B) acute-pbasc serum from rats- which had previously been exposed to 
ao inflammation-causing agent. The first dimension of separation is an IPO from pH 4-10, and the second dimen- 
sion is a 1.5-11. SWT gradient SDS-PAGE gel. Proteins were visualized by staining with araldo black. Further details 
of experimental procedures are included in (14, 49). 
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Proteose analysis as a biological assay has been success- 
fully used in the field of toxicology, to characterize 
disease states or to study differential activation of cells. 
The approach is limited, of course, by the fact that only 
the vfeible protein spots are included in the assayj and it 
is well known that a substantial but far from complete 
fraction of cellular proteins are detected if a total cell 
lysate is separated by 2-DE. Proteins may not be 
detected in 2-DB gels because they are not abundant 
enough to be visualized by the detection method used, 
because they do not migrate within the .boundaries (size, 
pi) resolved by the gel, because they are not soluble 
under the conditions used, or for other reasons. 

A different way to use proteome analysis as a biological 
assay to define the state of a biological system is to take 
advantage of the wealth of information contained in 
2-DE protein patterns. 2-DE is referred to as two-dimen- 
sional because of the electropboretic mobility and the 
isoelectric points which define the position of each pro- 
tein in a 2-DB pattern. In addition to the two dimen- 
sions used to generate the protein patterns, a number of 
additional data dimensions are contained in the protein 
patterns. Some of these dimensions such as protein 
expression level, phosphorylation state, subcellular loca- 
tion, association with other proteins, rate of synthesis or 
degradation indicate the activity state of a protein or a 
biological system. Comparative analysis of 2-DB protein 
patterns representing different states is therefore ideally 
suited for the detection, identification and analysis of 
suitable markers. Once again it must be emphasized that 
in this type of experiment only a traction of the cellular 
proteins is analyzed. Since many regulatory proteins are 
of low abundance, this limitation is a concern, particu- 
larly in cases in which regulatory pathways are being 
investigated. 

5 Concluding remarks 

In this report we have addressed three main issues 
related to proteome analysis. First, we have discussed 
the rationale for studying proteomes. Second, we have 
assessed the technical feasibility of analyzing proteomes 
and described current proteome technology, and third, 
we have analyzed the utility of proteome analysis for bio- 
logical research. It is apparent that proteome analysis is 
an essential tool in the analysis of biological systems. 
The multi-level control of protein synthesis and degrada- 
tion In cells means that only the direct analysis of 
mature protein products can reveal their correct identi- 
ties, their relevant state of modification and/or associa- 
tion and their amounts.. Recently developed methods 
have enabled the identification of proteins at ever* 
increasing sensitivity levels and at a high level of auto- 
mation of the analytical' processes. A number of tech- 
nical challenges, however, remain. While it is currently 
possible to identify essentially any protein spots that can 
be visualized by common staining methods, it is ap- 
parent that without prior enrichment only a relatively 
small and highly selected population of long-lived, 
highly expressed proteins is observed. There are. many 
more proteins in a given cell which are not visualized by 
such methods. Frequently it is the low abundance pro- 
teins that execute key regulatory functions. 



We have outlined the two principal ways proteome anal- 
ysis is currently being used to intersect with biological 
research projects: the proteome as a database or data 
archive and proteome analysts, as a biological assay. Both 
approaches have in common that at present they are con- 
ceptually and technically limited. Current proteome data- 
bases typically are limited to one cell type and one state 
of a cell and therefore do not account for the dynamics 
of biological systems, the use of proteome analysis as a 
biological assay can provide a wealth of Information, but 
it is limited to the proteins detected and is therefore not 
truly proteome-wide. These limitations in proteomics are 
to a large extent a reflection of the fact that proteins in 
their fully processed form cannot easily be amplified and 
are therefore difficult to isolate in amounts sufBcteatfor 
analysis or experimentation. The fact that to datefno 
complete proteome has been described further attest! to 
these difficulties. With continued rapid progress in pro- 
tein analysis technology, however, we anticipate that the 
goal of complete proteome analysis will eventually 
become attainable. 
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from (he National Science-Foundation Science and Technol- 
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High-throughput technologies, such as proteomic screening and DNA micro-arrays, produce vast 
amounts of data requiring comprehensive analytical methods to decipher the biologically relevant 
results. One approach would be to manually search the biomedical literature; however, this would be 
an arduous task. We developed an automated literature-mining tool, termed MedGene, which 
comprehensively summarizes and estimates the relative strengths of all human gene-disease 
relationships in Medline. Using MedGene, we analyzed a novel micro-array expression dataset 
comparing breast cancer and normal breast tissue in the comeKt of existing knowledge. We found no 
correlation between the strength of the literature association and the magnitude of the difference in 
expression level when considering changes as high as 5*fold; however, a significant correlation was 
observed (r « 0.41; p » 0.05) among genes showing an expression difference of 10-fold or more. 
Interestingly, this only held true for estrogen receptor (ER) positive tumors, not ER negative, MedGene 
identified a set of relatively understudied, yet highly expressed genes in ER negative tumors worthy of 
further examination. 

Keywords: bioinfdrmatics • micro-array ♦ text mining • gene-disease association • breasi cancer 



Introduction 

At its current pace, the accumulation of biomedical literature 
outpaces the ability of most researchers and clinicians to stay 
abreast of their own immediate fields, let alone cover a broader 
range of topics. For example, to follow a single disease, e.g.. 
breast cancer, a researcher would have had to scan 130 different 
journals and read 27 papers per day in 1999. 1 This problem is 
accentuated with high-througbput technologies such as DNA 
micro-arrays and proleomics, which require die analysis or 
large datascts involving thousands of genes, many of which are 
unfamiliar to a particular researcher. In any tnlcroarray experi- 
ment, thousands of genes may demonstrate statistically sig- 
nificant expression changes, but only a fraction of these may 
be relevant to the study. The ability to interpret these datasets 
would be enhanced if they could be compared to a compre- 
hensive summary of what is known about all genes. Thus, there 
is a need to summarize existing knowledge In a format that 
allows for the rapid analysis of associations between genes and 
diseases or other specific biological concepts. 

One solution to this problem is to compile structured digital 
resources, such as the Breast Cancer Gene Database 1 and the 
Tumor Gene Database. 1 However, as these resources are hand- 
curated, the labor-intensive review process becomes a rate- 
limiting step in the growth of the database. As a result, these 
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databases have a limited scale and the genes are not selected 
in a systematic fashion. 

An alternative approach is automated text mining; a method 
wliich involves automated information extraction by searching 
documents for text strings and analyzing their frequency and 
context. This approach has been used successfully in several 
instances for biological applications. In most cases, It has been 
applied to extract information about the relationships or 
interactions that proteins or genes have with one another, In 
the literature or by functional annotation. 3-7 Thus far, few 
publication have applied text-mining to examine the global 
relationships between genes and diseases. Perez-lratxeta et al. 
automatically examined the GO (Gene Ontology) annotation 
of genes and their predicted chromosomal locations in order 
to identify genes linked to inherited disorders. 8 

To obtain a more global understanding of disease develop- 
ment it would be valuable to Incorporate Information regarding 
all possible gene-disease relationships, including biochemical, 
physiological, pharmacological, epidemiological, as well as 
genetic. This information would enable comprehensive com- 
parisons between large experimental datasets and existing 
knowledge in the literature. This would accomplish two things. 
First, It would serve to validate experiments by demonstrating 
that known responses occur as predicted. Second, it would 
rapidly highlight which genes arc corroborated by the literature 
and which genes are novel in a given context. We have utilized 
a computational .approach to literature mining to produce a 
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comprehensive set of gene-disease relationships. In addition, 
we have developed a novel approach to assess the strength of 
each association based on the frequency of citation and co- 
citatlon. We applied this tool to help interpret the data from a 
large micro-array gene expression experiment comparing 
normal and cancerous breasL tissue. 



Methods 

McdGene Database. MedCene is a relational database, stor- 
ing disease and gene information from NCBI. text mining re- 
sults, statistics! scores, and hyperlinks to the primary lit- 
erature. MedGene has a web-based user interface for users to 
query the database (http://hlpseq.mediwvard.edu/MedCene7). 

Text Mining Algorithms. MeSH files were downloaded from 
the MeSH web site atNLM (Nation Library of Medicine) (http:// 
www.nim.nih.gov/mesh/meshhome.htmI) and human disease 
categories were selected. LocusUnk files were downloaded from 
the LocusLink web site at NCBI (http://www.ncbi.nih.gov/ 
LocusUnk/). Official/preferred gene symbol, official/preferred 
gene name, and gene alternative symbols and names, all 
relevant annotations and URLs for eachtocttsLtnk record, were 
collected. Gene search terms were used for literature searching 
and included all qualified gene names, gene symbols, and gene 
family terms. Primary gene keys, predominantly qualified gene 
ramily terms and gene official/preferred symbols, were used 
to index Medline records. If the official/preferred gene symbols 
did not meet the standards to be an index, then qualified gene 
official/preferred names were used. A local copy of Medline 
records (up to July, 2002) was pre-selected. 

A JAVA module examined the MeSH terms and then indexed 
each Medline record with the appropriate disease terms. A 
separate JAVA module was used to examine the titles and 
abstracts for gene search terms and then to index the gene- 
related Medline records with the relevant primary gene key(s). 

Statistical Methods. For every gene and disease pair, we 
counted records that were indexed for both gene and disease 
(dourite positive hits), for disease only (disease single hits), for 
gene only (gene single hits), and for neither gene nor disease 
(double negative hits) to generate a 2 x 2 contingency table. 
On the basis of the contingency table-framework, we applied 
different statistical methods to estimate the strength of gene- 
disease relationships and evaluated the results. These methods 
included chi-square analysis. Fisher's exact probabilities, rela- 
tive risk of gene, and relative risk , of disease* (http:// 
hipseq.med.harvard.edu/MedCene/). In addition, we computed 
the "product of frequency", which is the product of the 
proportion of disease/gene double hits to disease single hits 
and the proportion of disease/gene double hits to gene single 
hits. To obtain a normal distribution, we transformed all the 
statistical scores using the natural logarithm. We selected the 
log of the product of frequency (LPF) to validate MedGene and 
to use for the analysis with the micro-array data. Spearman 
rank-correlation coefficients were used to assess the linear 
relationship between LPF and micro-array fold change in 
expression level. 

Global Analysis. Diseases with at least 50 related genes were 
selected for clustering analysis, and the LPF scores were 
normalized with total score for each disease. Hierarchical 
clustering was done with the "Cluster* software and the 
clustering result was visualized using TreeViewer" (http:// 
rana.lbl.gov/EisenSoftware.htm). 
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Breast Tissue Micro-Arrays. Eighty-nine breast cancer 
samples (79% ER-posltJve) and 7 normal breast tissue samples 
were selected from the Harvard Breast SPORE frozen tissue 
repository and were representative of the spectrum of histo- 
logical types, grades, and hormone receptor immuno-pheno- 
types of breast cancer. Biotinylated cRNA. generated from the 
total RNA extracted from the bulk tumor, was hybridized to 
Affymetrix U95A oligo-nucteotide micro-arrays. These micro- 
arrays consist of 12 400 probes, which represent approximately 
9000 genes. Raw expression values were obtained using CENE- 
CHIP software from Affymetrix. and then further analyzed using 
the DNA-Chip Analyzer (dChip) custom software. 

Results 

Automated Indexing of Medline Records by Disease and 
Gene. To study the gene-disease associations in the literature, 
we first compiled complete lists for human diseases and human 
genes. To Index all Medline records that were relevant to 
human diseases, the Medical Subject Heading (MeSH) index 
of Medline records was utilized. MeSH Is a controlled medical 
vocabulary from the National Library of Medicine and consists 
of a set of terms or subject headings that are arranged in both 
an alphabetic and an hierarchical structure. Medline records 
are reviewed manually and MeSH terms are added to each with 
software assistance. 9 - 10 Twenty-three human disease category 
headings along with all of their child terms (see the Supporting 
Informatioa Supplemental Table 1. or visit http://hipseq. 
med.harvard.edu/MedGene/publlcation/s„TaWe I. html) were 
selected from the 2002 MeSH index creating a list of 4033 
human diseases. 

No index comparable to the MeSH index exists for genes, 
and thus. It was necessary to apply a string search algorithm 
for gene names or symbols found in Medline text, A complete 
list of genes, gene names, gene symbols, and frequently used 
synonyms were collected from the LocusUnk database at 
NCBI, 11 - 12 which contains 53 259 independent records keyed 
by an official gene symbol or name (June 18 m , 2002). For the 
purposes of this study, no distinction was made between genes 
and their gene products. Authors often use the same name for 
both, differentiating the two only by the use of Italics, if at all. 
For the intended use of this study, this lack of distinction is 
unlikely to have a large effect and may In fact be beneficial. 

Initial attempts to search the literature using these lists 
revealed several sources of false positives and false negatives 
(Table 1). False positives primarily arose when the searched 
term had other meanings, whereas false negatives arose from 
syntax discrepancies necessitating the development of filters 
to reduce these errors. The syntax issues were readily handled 
by including alternate syntax forms in the search terms. The 
false positive cases, caused by duplicative and unrelated 
meanings for the terms, were more difficult to manage. Where 
possible, case sensitive string mapping reduced inappropriate 
citations. In many cases, however, this was not sufficient and 
the terms had to be eliminated entirely, thereby reducing the 
false positive rate but unavoidably under-representing some 
genes. 

For the purposes of data tracking, a primary gene key was 
selected to represent all synonyms that correspond to each 
gene. Medline records were indexed with a primary gene key 
when any synonym for that key was found in the title or 
abstract. Case- insensitive string mapping was used for all 
searches except as noted above. No additional weight was 
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Table 1, Systematic Sources of False Positives and False Negatives in Unfiltered Data* 
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source or error 



error type 



•mmpte 



filter solution 



gene symbol/name 
is not unique 



gene symbol is 

unrelated abbreviation 
gene symbol/name 

has language meaning 
nonstandard syntax 
unofficial gene name/symbol 
nonspectfled gene name 



false positive MAG-myelln 

associated glycoprotein 
M>4C-maHgnancy-associated 
i protein 
false positive itt-pallid.homologue (mouse). 

pallidin (also abbrev. for Pennsylvania) 
false positive VMS- Wiskott-Aldrich Syndrome 

(also the word "was") 
false negative BAC-1 instead of BAG! 
false negative . P53 Instead or TP53 
false negative estrogen receptor instead of 
Estrogen receptor 1 



eliminate this term 

eliminate this term 

case-sensitive string search 

add dash term 

add all gene nicknames 

add family stem term 



. !rn;n(irv , ludies Mcd( ine was searched for cooccurrence of genes and diseases and the resulting output ww evaluated to Identity error sources that 
11 ^^^S^^^^s^ is caLortwd Oy the type or error it causes: false positives are suggested relationships that are not real and 

error. Ingeneral. error rates maximized sensitivity, even ei the expense of specificity if needed. 

added for multiple occurrences of a term or the co-occurrence 
of multiple synonyms for the same gene key. 

Medline records were searched with at) qualified gene 
identifiers, such as the ofildaJ/preferred gene symbol, the 
ofncial/preferred gene name t all gene nicknames and all syntax 
variants. In situations where there are several members of a 
gene family or splice variants, some authors prefer to use a 
shortened gene family name, eg., estrogen receptor instead of 
estrogen receptor 1 (£S/?f>, creating a source of false negatives. 
For this reason, gene family stem terms were created for all 
genes that have an alpha or numerical suffix (e.g.. IL2RA TGFp, 
E$RI, etc.) and then used to search the literature. The family 
stem terms were handled separately from the specific gene 
names so that it would be clear when linkages were made to 
the gene family versus a specific member In that family. 

To improve performance and accuracy, some pre-selection 
was applied to the records that were scanned. First, review 
articles were eliminated to avoid redundant treatment of 
citations. Second, non-English Journals were removed because 
the natural language filters were only relevant to English 
publications. Finally, journals unlikely to contain primary data 
about gene-disease relationships were also removed (e.g.. Im. 
j. Health Educ. t Bedside Nurse, and / Health Econ). Together, 
these filters reduced the 12 198 221 Medline publications (July 
2002) by 37%. 

Ranking the Relative Strengths or Gene- Disease Associa- 
tions. In total, there were 618 708 gene-disease co-citations, 
in which 16% (8297) of all studied genes had been associated 
to a disease and 96% (3875) or all diseases had been associated 
to at least one gene. To rank the relative strengths of gene 
disease relationships, we tested several different statistical 
methods and'examined the results. With the exception of the 
relative risk estimates, the methods provided similar "results 
with respect to the rank order of the gene-disease association 
strengths. However, after comparing the results to other 
databases and after consulting disease experts, the log of the 
product of frequency (LPF) was selected for further analysis 
because it gave the best results overall. 

Validation of MedGene. In developing this tool, it was 
important to mininiize the number of missed genes (false 
negatives) and miscalled genes (false positives). However, in 
situations when these goals were in conflict, inclusiveness was 
prioritized. To determine the false negative rate In MedGene. 
breast cancer was used as a test case because it was associated 
with more genes than any other human disease and because 




Figure 1- Estimation of the false negative rate by comparison 
with hand-curated databases. The breast cancer-related genes 
identified by MedGene were compared with those listed in 
several other databases including the Tumor Gene Database 
(TGD), 2 the Breast Cancer Gene Database^CG), 1 GeneCards 
(GC)" and Swissprot.™ Genes were considered false negatives 
if they were represented in at least one of these other databases 
and not In MedGene and their link to breast cancer was sup- 
ported by at least one literature reference. All literature references 
were verified by manual review to confirm their validity. The 
number of genes In each database or shared by more than one 
database is indicated. The false negative rate was calculated by 
genes missed at MedGene (26)/total number of nonover lapping 
genes in other databases (285). 



there were several public databases that link genes to breast 
cancer. We compared the list of breast cancer-related genes 
from MedGene to these databases* illustrated in Figure 1. 
Among the 285 distinct breast cancer-related genes that were 
supported by at least one literature citation in these hand* 
curated databases, 26 were absent from MedGene. suggesting 
a false negative rate of approximately 9%. To determine why 
these were missed, all literature references for these genes (80 

Journal of Protcome Research • Vol. 2, No. 4. 2003 407 



research articles 

oapers) were reviewed manually (see the Supporting Informa- 
tion Supplemental Table 2. or visit http://hlpseq.rned. 
harvard.edu/MedGene/publication/s_Tab\e 2,html). ^mong 
these papers, most false negatives were caused by nonstandard 
gene terms or gene terms eliminated by our specificity niters. 
Few genes were missed because they were only mentioned in 
review papers (0.4%) or ihey appeared only in the bo^ of the 
manuscripTbut not the abstract or title (1.1%). Of note. 
MedCene Identified approximately 2000 additional breast 
cancer-related genes not listed in any other database- 

To assess the false positive error rate, two complementary 
approaches were used: a detailed analysis of one disease and 
a global examination of 1000 diseases. The detailed approach 
examined the false positive error rate and its sources, whereas 
the global approach tested whether the overall results made 
biomedical sense. 

Using the LPF. 1467 genes related to prostate cancer were 
assembled in rank order. We then retrieved approximately 300 
Medline records each for the highest ranked 100 and the lowest 
ranked 200 genes and manually reviewed the titles and 
abstracts to determine the verity of the association. Nearly 80% 
of the highest ranked 100 genes, fell into one of the five 
categories that reflect meaningful gene-disease relationships 
(see the Supporting Information, Supplemental Table 3, or visit 
http//hipseq.med.hafvard.edu/MedGene/publlcatlon/ 
s Table 3.html), Among the lowest ranked. ZOO genes, ap- 
proximately 70% reflected true relationships. Of the 600 records 
reviewed there were only two in which the association between 
the eene and the disease was described as negative. Both were 
genes with very low scores. In both cases, the authors did not 
argue the absence of any relationship, but rather that a 
particular feature of the gene or protein was not shown to be 
related to human prostate cancer. 11 " 

The coincidence of some gene symbols with medical ab- 
breviations, chemical abbreviations and biological abbrevia- 
tions resulted in most of the false positives (see the Supporting 
Information. Supplemental Table 4. or visit http;//hipse- 
q rned.harvard.edu/MedCene/pubIication/s^Table 4.html). em- 
x phasbdng (he importance of the filters that were added in the 
search algorithm (Table 1). Without the filters, the false positive 
rate more than doubled, and the false negative rate rose 
dramatically (data not shown). For example, among the papers 
about breast cancer, there were only 12 Medline records that 
referred to ESRi and 10 to ESR2, whereas almost 2000 papers 
mentioned estrogen receptor without specifying £5*/ or £Stf£ 
this latter group was detected by the family stem term filter. 

To further validate these results, a global analysis of the gene- 
disease relationships described by MedCcne was performed. 
For this experiment. It was reasoned that the more closely 
related the diseases are to one another, the more they will be 
related to the same gene sets. Thus, if the relationships defined 
by MedGene accurately rellected the literature, then an unsu- 
pervised hierarchical clustering of the gene data should group 
diseases in a manner consistent with coirunon medical think- 
ing Conversely, if the clustered diseases do not make sense 
biologically or medically, it may reflect excessive false positives, 
false negatives, or inappropriate scoring of the data 

To execute tills experiment, the gene sets and the corre- 
sponding LPF values for 1000 randomly selected diseases (each 
with at least 50 gene relationships) were used as a dataset for 
clustering the diseases. A review of the results showed that the 
resulting disease clusters were indeed logical based upon 
common medical knowledge (see the Supporting Information. 
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Supplemental Figure 1. or visit htq)://lnpseq.med,hatvard.edu/ 
MedCene/pubhcation/s.Figure l.html). For example in one 
such cluster shown in Figure 2. diabetes and its complications 
grouped together and were also closely linked to diseases 
associated with starvation states. 

The number or genes associated with a given disease can 
be estimated by adjusting the MedCene number up by the felse 
negative rate (-9%) and down by the false positive rate (-26% 
on average). Using this, the average disease has 103.7 ± 45.3 
(mean ± s.d.) genes associated with it, although the range is 
quite broad with 2359 genes related to breast cancer. 212Z 
genes related to lung cancer and no genes related to a number 
of diseases. 

Applying MedGene to the Analysis of Large Datasets. Access 
to a comprehensive summary of the genes linked to human 
diseases provided an opportunity to analyze data obtained from 
a high-throughput experiment. We compared the MedGene 
breast cancer gene list to-a gene expression data set generated 
from a micro-array analysis comparing breast cancer and 
normal breast tissue samples. Micro-array analysis Identified 
2286 genes that had greater than a 1-fold difference In mean 
expression level between breast cancer samples and normal 
breast samples. Using MedGene, we sorted the 2286 genes into 
four classes: 555 genes direcdy linked to breast cancer in the 
literature by gene term search (first-degree association by gene 
name); 328 genes direcdy linked by family term search (first- 
degree association by family term); 1021 genes linked to breast 
cancer only through other breast cancer genes (second-degree 
association); and 505 genes not previously associated with 
breast cancer. (See the Supporting Information. Supplemental 
Figure 2, or visit http://hipseq.med.harvard.edu/MedGene/ 
p«bilcation/s_Jigure 2.html.) Among the 505 previously un- 
related genes, 467 were either newly identified genes or genes 
that had not previously been associated with any disease. 
Among the remaining 38 genes. 9 had been related to other 
cancers, specifically esophageal, colon, uterine, skin, and cervix. 

To determine whether the genes highlighted by the micro- 
array analysis were more likely to have been previously linked 
to breast cancer In the literature, we created a two-dimensional 
plot of the fold change of expression level between breast 
cancer and norma] tissue versus the literature score (LPF) 
(Figure 3A). There was a broad spread of expression changes 
among the genes directly linked to breast cancer ranging from 
less than 1-fold change (68%) to over 40-fold (0.3%). Notably, 
the majority of genes with greater than 10-fold expression 
changes were linked to breast cancer by first-degree assoda- 
tloa 

Among all 754 genes directly linked to breast cancer in the 
literature, there was no correlation between LPF and micro- 
array fold change (r - 0.018. p-vatue - 0.62). However when 
we stratified the analysis based on the magnitude of the fold 
change, we observed an increasing trend in correlation (Figure 
3B) suggesting that genes with a more substantial change in 
expression level were more likely to have a stronger association 
in the literature. For genes that had 10-fold change or mora in 
expression level, the correlation increased to 0.41 Rvalue - 
0.05). 

When we evaluated tlve micro-array data separately for ER 
positive and ER negative tumors, the trend in correlation 
between fold change and literature score was highly dependent 
on estrogen receptor status. Interestingly, there was a similar 
trend in correlation for ER positive tumors, but no trend In 
correlation for ER negative tumors. 
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Figure 2 Global validation by clustering analysis. 2(A). The gene sets and the corresponding LPF values for 1000 diseases, each with 
at least 50 gene relationships, were used in an unsupervised clustering of ihe diseases based on the gene patterns associated wtth 
them A sample of the data is shown here. 2(8). One of the resulting clusters is shown that corresponds to blood sugar states. Diabetes 
terms (abovethe line) and starvaUon states terms (under the line) clustered together. Within these groups, there is also clustering of 
diabetic small vessel complications, altered serum chemistries, nutritional disorders, etc.(5upplement8l Figure 1: htip^/hipseq.med. 



diabetic small vessel complications. 
harvard.edu/MedGene/puDHcation/s,Flgure Lhtml). 

Finally, to validate our findings, we computed similar cor- 
relations between the breast cancer expression data and 
LPF scores generated by MedGenc for hypertension, a 



disease unrelated to breast cancer. As expected, we did not 
observe an increasing trend in correlation for hyperten- 
sion- 
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Figure 3. Relationship between literature spore and functional data for breast cancer. 3A. The data from an expression analysis of 
samples for breast tumors and normal breast tissue were analysed to indicate the fold difference of expression level between breast 
tumor and normal sample (cutoff > 3-fold change). The fold changes were plotted against the literature score for the same gene set. 
Green dots represent first-degree association by gene search, blue dots represent first-degree association by family search and red 
dots represent no-associatidn. Some well-studied genes, such as BRCA2 (pink circle), are not reflected by a substantial difference in 
expression level. Furthermore, the majority of genes that have no association with breast cancer in the literature had less than 10-fold 
expression changes (shaded area). 3B. The Spearman rank-correlation coefficients between literature score (LPF) and the fold change 
of expression level between tumor and normal breast samples (y-axis) in relation to the amount of fold change of expression level 
(x-axis). Gene rank lists were generated for breast cancer (blue) and hypertension (pink). Correlations were also computed between 
the breast cancer gene LPF scores and fold change expression data among estrogen receptor positive tumors only (light blue) and 
estrogen receptor negative tumors onfy (purple). 
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estrogen receptor 

PGR 

EHBBZ 

BRCAJ 

BRCA2 

ECFR 

CYP19 

TFFI 

PSEN2 

TP53 

CES3 

CEACAM5 

ERBB3 

cyclin 

C0X5A 

cathepsin 

ERBB4 

TRAM 

CCNDl 

EGF 

MVCt 

insulin-like 
BCL2 

mucin 
FCF3 



hypertension 


rheumatoid arthritis 


REN 


RA 


DBP 


TNFRSFWA 


LEP 


CRP 


ACT 


AS 


INS 


ESttl 


kallikreln 


HLA-DRBl 


ACE 


DRl 


endothetln 


inierleukjn 


5/0046 


TNF 


SDK 


IL6 


OIANPH 


collagen 


SARI 


ILIA 


PJH 


ACR . 


CD59 


TNFRSF12 


ALB 


112 


CYP11B2 


CHI 31 1 


MAT2B 


IL8 


angiotensin ' 


Jntertcukln 1 


receptor 


matrix 


ACTR2 


metalloprotelnase 


NPPA 


Interferon 


LVM 


CD68 


OBH 


IL4 


NPY, 


JL17 


POMC 


MMP3 


neuropeptide 


SIL 



bipolar disorder 



atherosclerosis 



ERDAl 

SNAP29 

PFKL 

DRD2 

TRH 

IMPA2 

HTR3A 

DRD3 

REM 

KCNN3 

DRD4 
HTR2C 

REIN * 

DBH 

MAOA 

com 

HTR2A 

SYNJl 

JNPP1 

NBD04L 

FRA13C 

transducer of 

ERBB2 

BAIAP3 

ATP1B3 
DRD5 



apolipoproteln 

APOE 

LDLR 

ELN 

ARC I 

APOB 

APOAl 

MSR1 

LPL 

PONi 

plasminogen 
activator inhibitor 
PLC 

vascular cell 

adhesion molecule 

ATOH1 

VWF 

INS 

ARC2 

ABCAJ 

OLRl 

collagen 

MCP 

lipoprotein 
APOA2 
Intercellular 
adhesion molecule 
RAB27A 



* Modfiev* results for the too 25 genes associated with breast neopUumi hypertension, rheumatoid arthriUs, bipolar disorder, and a^osclerosfc.respecttvely, 
raak^ bf^F s^^ co^uT^ ***** disease is available at MedCene website ^//hlpseq. med.harvard.edu/ 

McdCcne/). 



Discussion 

The Human Genome Project heralded a new era in biological 
research where the emphasis on understanding specific path- 
ways has expanded to global studies of genomic organization 
and biological systems. High- throughput technologies can 
provide novel insight Into comprehensive biological function 
but also introduces new challenges. The utility of these 
technologies Is limited to the ability to generate, analyze, and 
interpret large gene lists. MedGene. a relational database 
derived by mining the information in Medline, was created to 
address this need. MedGene users can query for a rank-ordered 
list of human gene-disease relationships fTable 2) for one or 
more diseases. Each entry is hyperlinked to the original paper$ 
supporting each association and to other relevant databases. 

MedCene is an innovative extension of previous text mining 
approaches. Perez-Iratxeta et *K used the CO annotation and 
their chromosomal locations to predict genes that may con 
tribute to inherited disorders. 8 MedGene takes a broader view 
and includes all diseases and all possible gene-disease relation- 
ships. FurUiermore. MedGene utilizes co-citation to indicate a 
relationship rather than CO annotation, which is limited to the 
subset of genes that have CO annotation. Our approach is 
complementary to that taken by Chaussabel and Sher, who 
used the frequency of co-cited terms to cluster genes into a 
hierarchy of gene-genq relationships. 6 

A unique aspect of this tool is the ability to assess the relative 
strengths of gene-disease relationships based on the frequency 
of both co-citation and single citation. This presupposes that 
most co-cltatlons describe a positive association, often referred 
to as publication bias ,s and is supported by our observations 



that negative associations are rare (Supplemental Table 3: 
http://hlpseq.med.harvard.edu/MedGene/publication/s^Ta- 
ble 3,html). Of course, relationships established by frequency 
of co-cttation do not necessarily represent a true biological link; 
however. It Is strong evidence to support a true relationship. 

Another important feature of MedGene Is the implementa- 
tion of software filters that substantially reduced the error rate. 
We estimate that less than 10% of all associations were missed 
and at least 70% of even the weakest associations were real. 
For this study, all of the filters that we applied were general 
ones. e.g.. expanding the list of all gene names to address the 
different syntax forms used by different journals, eliminating 
gene names that correspond to common English words, etc. 
The majority of the remaining search term ambiguities were 
idiosyncratic and difficult to identify systematically without 
causing a significant rise in false negatives. Alternative ap- 
proaches, such as the examination of the nearest neighbor 
terms, need to be considered to further reduce the false positive 
rate. 

It is not uncommon to see expression changes in micro* 
array experiments as small as 2- fold reported In the literature, 
Even when these expression changes are statistically significant, 
it is not always clear If they are biologically meaningful. When 
comparing expression levels of disease to normal tissue, one 
expects an enrichment of known disease-related genes to 
appear In the altered expression group. MedGene provided a 
unique opportunity to test this notion In the context of existing 
knowledge on a novel breast cancer micro-array dataset. For 
genes displaying a 5-fold change or less in tumors compared 
to normal, there was no evidence of a correlation between 
altered gene expression and a known role In the disease. This 
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Table 3. Genes with Large Expression Changes in ER- but 
Not in ER+ Breast Tumors 



gene symbol 



fold change 


fold change (ER-) 


1.0 


610.8 


1.2 


89.4 


1.2 


69.8 


1.9 


S9.6 


1.0 


38.5 


2,6 


33.2 


10 


30.6 


4.0 


27.9 


3.8 


21.9 


4.7 


18.6 


1.0 


14.6 


1.6 


14.4 


-1.0 


135 


4.2 


13.0 


4.4 


12.9 


-1.2 


12.3 


2.9 


12.2 


1.0 


11.8 


4.0 


11.6 


-4.3 


11.1 


* 2.9 


10.9 


3.0 


10.2 


4.6 


102 


1.0 


10.0 


-1.3 


-10.4 


-1.1 


-rl0.8 


1,3 


-11.4 


-4.1 


-15.7 


1.1 


-16.2 


-4.6 


-22.3 


-1.1 


-36.8 


-2.8 


-51.5 


-1.4 


-64.9. 


-1.0 


-83.1 


-1.6 


-85.9 


2.4 


-150.3 



KRTHBl 
BRS3 
DKKJ 
ZICI 
TLRl 

KIAA0680 
CDKN3 
EB12 
GZMB 
STK18 
GPR49 
MYOW 
LAD I 
POLE2 
HMC4 
BCL2L11 
LRP8 
CCNBZ 
CCNE2 
FOB 
KNSL6 
H1F5 

SERPINHZ 
YAP1 
LPtlB 
TCEA2 
TFF1 
COU7A1 
POPS' 
BPAGl 
PDZKl 
VECFC 
MUC6 
SERPINA5 
MElSl 
OA It 

Table 3. MedCene identified a set of relatively understudied, yet highly 
expressed genes In £R negative, but not ER positive breast tumors. AU of 
these genes have either never been co-cited with breast cancer or have a 
weak association except those marked with an *. 



reflects the many genes whose role In breast cancer may not 
involve large changes In expression in sporadic tumors (e.g., 
BRCAl and BRCAZi and genes whose modest changes In 
expression may be unrelated to the disease. Strikingly, among 
genes with a 10-fold change or more in expression level, there 
was a strong and significant correlation between expression 
level and a published role in the disease, providing the first 
global validation of the micro-array approach to identifying 
disease-specific genes. 

The results derived from MedGene have two implications. 
First, a careful hunt for corroboraung evidence of a role in 
breast cancer should precede any further study of genes with 
less than 5-fold expression level changes. Second, any genes 
with 10-fold changes or more are likely to be related to breast 
cancer and warrant attenUon. It is likely that this threshold will 
change depending on the disease as well as the experiment. 

Interestingly, the observed correlation was only found among 
ER-positive tumors, not ER-negative. This may reflect a bias 
in the literature to study the more prevalent type of tumor in 
the population. Furthermore, this emphasizes that caution 
must be taken when Interpreting experiments that may contain 
subpopulations that behave very differently. The MedCene 
approach identified a set of relatively understudied, yet highly 
. expressed genes In ER-negatlve tumors that are worthy of 
further examination CTable 3). 
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In conclusion, we have developed an automated method of 
summarizing and organizing the vast biomedical literature. To 
our knowledge, the resulting database is the most comprehen- 
sive and accurate of its kind. By generating a score that reflects 
the strength of the association, it provides an important tool 
for the rapid and flexible analysis of large datasets from various 
high-throughput screening experiments. Furthermore, it can 
be used for selecting subsets of genes for functional studies, 
for building disease-specific arrays, for looking at genes com- 
mon to multiple diseases and various other high-throughput 
applications. In the future, it will be possible to enhance the 
utility of the MedCene database by building links between 
genes and other MeSH terms as well as other biological 
processes and concepts, such as cell division and responses to 
smalt molecules. 
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Genetic Instability in Epithelial 
Tissues at Risk for Cancer 
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Abstract: Epithelial tumors develop through a multistep process driven by 
genomic instability frequently associated with etiologic agents such as pro- 
longed tobacco smoke exposure or human papilloma virus (HPV) infection. 
The purpose of the studies reported here was to examine the nature of genomic 
instability in epithelial tissues at cancer risk in order to identify tissue genetic 
biomarkers that might be used to assess an individual's cancer risk and 
response to chemopreventive intervention. As part of several chemoprevention 
trials, biopsies were obtained from risk tissues (i.e., bronchial biopsies from 
chronic smokers, oral or laryngeal biopsies from individuals with premalig- 
nancy) and examined for chromosome instability using in situ hybridization. 
Nearly all biopsy specimens show evidence for chromosome instability 
throughout the exposed tissue. Increased chromosome instability was observed 
with histologic progression in the normal to tumor transition of head and neck 
squamous cell carcinomas. Chromosome instability was also seen in premalig- 
nant head and neck lesions, and high levels were associated with subsequent 
tumor development. In bronchial biopsies of current smokers, the level of 
ongoing chromosome instability correlated with smoking intensity (e.g., 
packs/day), whereas the chromosome index (average number of chromosome 
copies per cell) correlated with cumulative tobacco exposure (i.e., pack-years). 
Spatial chromosome analyses of the epithelium demonstrated multifocal clonal 
outgrowths. In former smokers, random chromosome instability was reduced; 
however, clonal populations appeared to persist for many years, perhaps 
accounting for continued lung cancer risk following smoking cessation. 

Keywords: chromosome instability; epithelial cells; aerodigestive tract; 
chemoprevention; cancer risk 



THE NEED FOR BIOMARKERS OF CANCER RISK AND 
RESPONSE TO INTERVENTION 

Epithelial cancers remain a major health challenge in the world. Despite improve- 
ments in staging and the application and integration of surgery, radiotherapy, and 
chemotherapy, the 5-year survival rate for individuals with lung cancer is only about 
15%. 1 Even if strategies for early detection are successful and lung cancers 
are detected at a stage where local tumor resection and treatment is curative, 
these patients will still be at significant risk for developing second primary tumors 
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associated with the problem of field cancerization. 2 Similarly, for individuals with a 
first head and neck primary tumor, even if the first malignancy is successfully treat- 
ed, the risk of developing a second primary in the tobacco smoke-exposed field is 
approximately 40%. 3 Similar cancer risk estimates exist for individuals who exhibit 
severe dysplasia in premalignant epithelial lesions. 4 For these reasons, it is important 
to focus on chemopreventive strategies to prevent the development of epithelial 
malignancies. 

Several problems confront chemoprevention trials designed to identify effica- 
cious agents. 5 First, chemoprevention trials with cancer incidence as a primary end- 
point require tens of thousands of subjects and tens of years of intervention and 
follow-up for statistical evaluation. For example, a recendy reported trial involved 
30,000 subjects and required 10 years in order to examine the impact of prevention 
strategies on lung cancer development, only to find a possible increased lung cancer 
incidence in current smokers who received fS-carotene. 6 

The problem of large, long-term trials results from the difficulty in identifying 
individuals at highest cancer risk who might best benefit from chemopreventive 
intervention. For example, 20 pack-year smokers, while known to be at relatively 
increased risk for developing lung cancer, have approximately a 10% lifetime risk 
for developing lung cancer. 7 This seriously limits the number of potentially useful 
strategies that can be clinically explored. A second problem facing chemoprevention 
trials is that little is known about what agents are likely to have efficacy, and even 
less is known regarding proper doses, schedules, and durations of treatment. Part of 
the reason for this problem is that too little is known about the physiologic processes 
that drive epithelial cancer development. 

In order to reduce the number of subjects and. the time required to carry out 
chemoprevention trials and thus allow the exploration of multiple prevention strate- 
gies, two types of advances are necessary. First, it is important to identify individuals 
at significantly increased cancer risk who might best benefit from different types of 
intervention. Second, in order to allow the rapid identification of agents, doses, and 
schedules of potentially efficacious agents, it is necessary to identify and validate 
surrogate endpoints of response that indicate whether the agents are having a posi- 
tive impact on the target tissue during the chemopreventive intervention. 

One approach to identifying individuals at increased aerodigestive tract cancer 
risk is to explore epidemiologic features of potential subjects. Molecular epidemio- 
logic studies are beginning to identify intrinsic host factors that place some individ- 
uals at increased cancer risk, especially those with a chronic smoking history. 8 Most 
intrinsic factors identified thus far reflect levels of carcinogen metabolism, repair 
capabilities of the host following DNA damage, and other measures of intrinsic 
cellular sensitivity to mutagens. While these factors can provide statistically signif- 
icant risk ratios in case-control studies that are controlled for tobacco exposure, the 
detected risk ratios usually fall in the range of 1.5 to 10. Unfortunately, this is not 
sufficient for the individualization of treatment and is not sufficiently high to signif- 
icantly reduce die numbers of subjects required for chemoprevention trials with 
cancer incidence as the primary endpoint 

Another approach to identifying individuals at increased cancer risk is to directly 
examine the target tissue of individuals with known carcinogen exposure (e.g., 
chronic tobacco smoke exposure), who have evidence of target organ dysfunction 
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(e.g., chronic obstructive pulmonary disease, changes in voice quality), or who 
have clinical evidence of premalignancy (e.g., bronchial metapiasia/dysplasia, oral 
leukoplakia/erythroplakia, cervical intraepithelial neoplasia). The conventional 
standard for assessing cancer risk in these situations is the degree of histological 
change. However, while individuals who show moderate to severe dysplasia are 
known to be at increased cancer risk when compared to individuals with lesser his- 
tologic changes, it is often difficult to distinguish reactive changes to carcinogenic 
insult from initiated and progressing lesions. Similarly, upon cessation of carcino- 
genic insult, histologic changes may reverse yet cancer risk may continue for many 
years. For example, while smoking cessation is associated with decreased bronchial 
metaplasia, 9 increased lung cancer risk continues for many years beyond smoking 
cessation. 10 In fact, nearly half the newly diagnosed lung cancer cases in the USA 
occur in former smokers. 1 1 

The development of assays to identify individuals at high epithelial cancer risk 
and to directly assess response to intervention in the target tissue is therefore an 
important research goal. Such assays should be objective and easily quantifiable and, 
if possible, minimally invasive. Moreover, they should reflect both the disease pro- 
cess and the targeted pathway and thereby be useful in assessing risk and monitoring 
response to intervention as well as directly testing the hypothesized mechanism of 
action of the chemopreventive strategy. 

In the chemoprevention setting it is important to recognize that one does not 
know the location of the future cancer. Thus, assays must necessarily be carried out 
on random biopsies of the field at risk. Even if there are clinically evident premalig- 
nant lesions, this does not mean that this is the likely site for a future malignancy. 
For example, nearly half of the cancers that develop in individuals with oral leuko- 
plakia arise away from the original index lesion. Similarly, since many newly diag- 
nosed lung cancers arise in the peripheral parts of the lung (e.g., adenocarcinomas), 
especially in former smokers, and since endobronchoscopy predominantly accesses 
central components of the lung, it is important to identify biomarkers that can reflect 
global processes ongoing in the target epithelial field associated with increased can- 
cer risk. Their discovery requires a better understanding of the tumori genesis pro- 
cess in epithelial fields at cancer risk. 



THE RATIONALE FOR STUDYING 
GENOMIC INSTABILITY AS A MARKER OF RISK 

Tumors of the aerodigestive tract have been proposed to reflect a "field canceriza- 
tion^ process whereby the whole tissue is exposed to carcinogenic insult (e.g., tob- 
acco smoke) and is at increased risk for multistep tumor development.' 2 * 13 Several 
types of clinical and laboratory data support this notion, including the frequent 
occurrence of synchronous primary and subsequent second primary tumors in the 
aerodigestive tract (frequently exhibiting dissimilar histologies as well as distinct 
genetic signatures l4 ~' 6 ) and the presence of premalignant lesions that precede and/or 
accompany the tumor in the exposed tissue field. 17 The notion of a multistep tumor- 
igenesis process is further supported by serial clinical and histologic evaluations of 
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target tissue or exfoliated cells where increasing degrees of histological abnormali- 
ties are observed over time. 18 

A working model for aerodigestive tract tumorigenesis is illustrated in FtGURE 1 . 
Tumorigenesis in the face of carcinogenic exposure likely involves a chronic process 
of tissue injury and wound healing. DNA damage induced by the carcinogen is likely 
fixed into permanent genetic changes (e.g., chromosome damage, chromosome non- 
disjunction, gene mutation, gene deletion, etc.) during the process of proliferation. 
This damage would be expected to be distributed throughout the exposed tissue field 
leading to a background of generalized genomic damage (depicted in Figure 1 as a 
background mat of increasing density). Chronic injury and repair likely leads to the 
accumulation of cells with increasing amounts of genetic changes as well as the out- 
growth of abnormal clones (triangles in Figure J) carrying an accumulation of 
genetic changes important for selective survival, dysregulated growth, and preferen- 
tial epithelial take-over by initiated clones (see Figure 2). 

Cellular and molecular evidence for the field carcinogenesis and muitistep tum- 
origenesis model comes from many laboratories. 19,20 With the advent of a wide array 
of molecular technologies, a large number of specific molecular genetic and epige- 
netic changes involving specific oncogenes, tumor suppressor genes, cell regulatory 
genes, and repair genes have now been described for aerodigestive tract cancers. The 
identification of these specific molecular changes have now provided probes to 
explore specific events occurring in premalignant lesions adjacent to aerodigestive 
tract tumors. 21 " 24 Frequently, these premalignant lesions showed a- subset of the 
same molecular changes found in the associated tumor, suggesting that these lesions 
might represent precursor lesions for the associated tumors (i.e., a manifestation of 
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FIGURE 1. Field cancerization and muitistep tumorigenesis. 
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FIGURE 2. Multiple focal clonal evolution during multistep tumorigenesis. 



a multistep tumorigenesis process). For example, studies of the premalignant lesions 
adjacent to head and neck tumors have provided evidence for a gradual accumulation 
of genetic alterations accompanied by evidence for dysregulation of cellular control 
mechanisms (e.g., alterations in expression of PCNA, EGFR, TGF-p, p53, and 
cyclinDI). 25 - 2 * 

These types of siudies have now also been applied to the target epithelium of indi- 
viduals at increased risk for aerodigestive tract cancer (i.e., individuals with a chron- 
ic smoking/alcohol history and/or prior aerodigestive tract cancer). Several groups 
(using polymerase chain reaction, PCR, analysis of microdissected epithelium) have 
now demonstrated the presence of clonal outgrowths in the target premalignant epi- 
thelium of individuals at increased risk for cancer. 29-31 For example, examination of 
bronchial biopsies derived from individuals with a 20 pack-year smoking history 
demonstrated that 76% of the cases showed evidence for LOH (3pl4, 9p21, or 
1 7p 1 3) in at least one of six lung biopsy sites. On a per site basis, some form of LOH 
was observed in 25% of the sites examined.- 9 

If aerodigestive tract cancer development reflects a field cancerization process 
involving multistep events, then risk and response information should be able to be 
derived from random biopsies or exfoliated cells from the field at risk or from assess- 
ments of tissue undergoing similar processes. Hypothetically, lesions exhibiting the 
greatest degree of genomic instability, clonal outgrowth, and abnormal epithelial 
regulation would be at the highest relative aerodigestive tract cancer risk. Similarly, 
an active chemopreventive intervention might be expected to decrease these mani- 
festations of risk. Reduced risk manifestations include decreased levels of ongoing 
genetic instability, decreased frequency of clonal outgrowths, and increased epithe- 
lial growth regulation. 
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THE MEASUREMENT OF CHROMOSOME INSTABILITY USING 
CHROMOSOME IN SITU HYBRIDIZATION 

Molecular genetic techniques, while extremely useful for detecting clonal chang- 
es in targets tissues, are somewhat limited in their ability to detect random genetic 
instability. Conventional cytogenetic assays are useful for detecting chromosome 
instability and clonal chromosome changes. However, they require numbers of 
dividing cells for karyotypic analysis that are difficult to attain in the setting of biop- 
sies acquired during the course of a chemoprevention trial. A technique was there- 
fore needed that would allow chromosome instability measurements in situations 
where few celts are available (e.g. small biopsies, brushings, or sputum samples) and 
where the target material might be fixed. It was also desirable to have a technique 
that would be adaptable to tissue sections, whereby spatial information could be 
retained and genotype/phenotype associations could be determined on the same or 
adjacent tissue sections. The technique of in situ hybridization (ISH) involves the 
use of DNA probes that recognize either chromosome-specific repetitive target 
sequences, chromosome single gene copy sequences, or sequences along the whole 
chromosome length or chromosome segments. 32 We have adapted the ISH technique 
for formalin-fixed, paraffin-embedded tissue sections and have applied it to a^ariety 
of tissues, including the aerodigestive tract. 33,34 

Using probes that label the centromere regions of specific chromosomes, this 
assay permits determination of the average chromosome number per cell for each 
specimen. This assay is also useful for detecting generalized chromosome instability 
during the tumorigenesis process. Normal diploid populations should have two cop- 
ies of each autosomal chromosome and should rarely show three or more chromo- 
some copies per cell (chromosome polysomy), especially in tissue sections where 
nuclear truncation results in an under-representation of chromosome copy number. 
Thus, the detection of cells with three or more chromosome copies would indicate 
the presence of chromosome instability, 

To examine this technique's potential for characterizing the muhistep tumorigen- 
esis process in the aerodigestive tract, we measured the fraction of cells exhibiting 
three or more chromosome copies in apparendy contiguous epithelial transitions 
from normal to hyperplastic to dysplastic to carcinomas, all on a single tissue slice 
of head and neck squamous cell carcinomas. 34 In these specimens, greater than 35% 
of the cases of adjacent "normal*' epithelium, greater than 65% of the cases of hyper- 
plastic epithelium, and greater than 95% of the dysplastic and tumor regions showed 
evidence of chromosome polysomy. Of interest, similar transitions of chromosome 
instability were observed with at least four different chromosome probes. Similar 
trends have also been observed in amenable tissue from other epithelial malignan- 
cies, including cervix, bladder, and breast. 35 These results thus suggested that the 
notions of field cancerization and muhistep tumorigenesis might apply to several 
epithelial tissues and that measures of chromosome instability might be useful for 
monitoring this process. 

In the situations described above, the premalignant lesions examined might be 
considered to represent epithelium at 100% risk of being in a cancer field, since they 
were located in the adjacent epithelium to the cancer. This then raises the question 
of the nature of genetic instability in the epithelium of individuals at increased risk 
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for developing cancer. To explore this issue, we obtained biopsies during the course 
of leukoplakia chemoprevention trials exploring the use of 13-cw-retinoic acid in 
reversing leukoplakia and probed them for genetic instability using in situ hybridiza- 
tion. In one retrospective study and in one prospective study of subjects with oral 
leukoplakia, the results indicate that those subjects whose pretreatment biopsies har- 
bor relatively high levels of genomic instability (i.e., more than 3% of the cells 
examined showing at least 3 chromosome 9 copies per cell) have a significantly 
higher likelihood of suffering early onset of head and neck cancer. 36,37 Interestingly, 
half of the tumors that did develop occurred away from the biopsy site used to mea- 
sure genetic instability. This result suggests that genomic instability measurements 
in carcinogen-exposed tissue can provide useful cancer risk estimates. 



THE RELATIONSHIP BETWEEN TOBACCO EXPOSURE AND 
CHROMOSOME INSTABILITY 

In recent years, the aerodigestive tract chemoprevention group at M.D. Anderson 
Cancer Center has initiated three sequential biomarker-associated chemoprevention 
trials involving chronic smokers with a greater than 20 pack-year smoking history. 
In each of these studies, endobronchial biopsies were obtained from six defined sites 
within the lung, including the carina and at bifurcation points at the upper, middle, 
and lower right lung and at the upper and lower left lung. Biopsies were obtained pri- 
or to and following chemopreventive intervention and were subjected to in siiu 
hybridization analysis in addition to analyses for other biomarkers. The first impor- 
tant finding was that some degree of chromosome polysomy was evident in all lung- 
sites examined, and this was observed independently of the particular chromosome 
probe utilized. 38 This finding supports the notion that random chromosome changes 
may be occurring throughout the exposed lung field. 

In a second study, bronchial biopsies were obtained from individuals with a 20 
pack-year smoking history. In this study, most of the subjects involved were current 
smokers. 39 Interestingly, all cases who showed metaplasia at one of six biopsy sites 
also showed chromosome polysomy in at least one biopsy site; overall, 88% of the 
sites showed some evidence of chromosome 9 polysomy. 40 Evidence for genetic 
instability was also detected in patients who did not show evidence of bronchial 
metaplasia in any of six biopsy sites despite a strong smoking history. In fact, more 
than 90% of the cases and more than 60% of the sites showed significant chromo- 
some polysomy (i.e., ai least three copies in at least 2 % of the cells examined). 
These results suggest that the lungs of long-term smokers show significant evidence 
of genetic instability, and this instability can be detected throughout the accessible 
bronchial tree, even when bronchial metaplasia is not evident. 

These studies in current smokers has allowed us to examine the relationship 
between the levels of genetic instability detected and subject characteristics such as 
smoking status (current or former), smoking history, and lung tissue pathologic 
changes. Evaluable biopsy material has now been obtained from more than 1 08 cur- 
rent smokers, including more than 480 evaluable biopsy sites. The mean metaplasia 
index in these current smokers was 30.4%. For the total population studied, the 
median chromosome index for the bronchial biopsies was 1.41 (range, 1.04-1.61) 
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and the median chromosome polysomy index was 2.0% (range 0-8.7%). This can be 
compared to a mean chromosome index between 1.2-1.4 for lymphocytes and very 
rare chromosome polysomy. Interestingly, the intrasubject variability in chromo- 
some instability was relatively low in most subjects and was less than the intersub- 
ject variability. These results suggested that chronic smokers harbor detectable 
chromosome instability throughout the accessible bronchial tree (supporting the 
field carcinogenesis notion) and that information from one biopsy site might yield 
representative information for the rest of the lung field. 

Since most of the current smokers exhibited bronchial metaplasia in at least one 
of the biopsied sites, this allowed us to examine the relationship between chromo- 
some instability and histologic changes, both on a site-by-site basis and on a per case 
basis. On a site-by-site basis, the chromosome indices of lesions showing squamous 
metaplasia were similar to those not showing metaplasia (i.e., median L43 vs. 1 .43), 
and the degree of chromosome polysomy in metaplastic lesions were only slightly 
higher than in non-metaplastic sites (medians: 2.2% vs. 1 .8%, respectively). Thus, 
the presence or absence of squamous metaplasia at a biopsy site does not necessarily 
correlate with the degree of underlying genomic instability. On the other hand, those 
subjects with metaplasia indices of at least 15% also showed higher levels of chro- 
mosome polysomy than did subjects with metaplasia index below 15% (medians: 
2.4% vs. 1.8%, p = 0.005). Thus, these chromosome instability assessments in cur- 
rent smokers appeared to reflect a more global process in the lung field. 

Tobacco exposure has been shown to significantly increase the risk of developing 
lung cancer, and the degree of risk is related to the extent of tobacco exposure. We 
were interested in determining the relationship between individuals* smoking histo- 
ry parameters and the levels of chromosome change found in their lungs following 
years of tobacco exposure. While there was significant intersubject variation for sim- 
ilar tobacco exposure histories, overall there was a significant correlation between 
the degree of chromosome polysomy and the intensity of ongoing tobacco exposure 
(packs/day, p - 0.02 on a per site basis) and with the extent of tobacco exposure 
(pack-years, p = 0.003). Thus the amount of chromosome polysomy reflects the 
intensity and extent of tobacco exposure. At the same time, individuals with similar 
smoking histories showed widely divergent amounts of chromosome polysomy, pos- 
sibly reflecting differences in intrinsic sensitivity between subjects. There was also 
strong correlation between the chromosome index and the duration of the smoking 
history (smoking years) and total accumulated exposure (pack-years, p - 0.0001). 
These results suggest that tobacco exposure is associated with the initiation and 
accumulation of chromosome instability in the exposed lung; however individuals 
are differentially sensitive to carcinogenic insult. The working hypothesis is that 
those individuals who accumulate the highest degree of chromosome changes will 
be at the highest lung cancer risk. 

Many of the bronchial biopsies from chronic smokers examined by 'in situ hybrid- 
ization showed a rise in the chromosome index above that expected for a diploid cell 
population, especially in subjects with an extensive smoking history. The rise in 
chromosome index was also accompanied by an increase in the fraction of cells 
exhibiting at least 3 chromosome copies per cell. To determine if a rise in the tissue 
chromosome index was due to clonal expansion of populations with chromosome tri- 
somy, the chromosome copy number and relative coordinates of each cell scored in 
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the bronchial epithelium was recorded and a spatial genetic map was created. 41 We 
then developed algorithms for calculating localized chromosome indices within the 
tissue. Since trisomic clones would have, on average, three chromosomes instead of 
two, those cells involved in neighborhoods with chromosome indices three-halves 
that of diploid populations could be marked as being part of a trisomic clone. Simi- 
larly, groups of cells with chromosome indices half that of diploid populations could 
be marked as being part of a monosomic clone. This allowed the generation of a sec- 
ond-order, two-dimensional genetic map representation of the bronchial epithelium 
showing the relative locations of cells involved in monosomic and trisomic clonal 
outgrowths. When adjacent" tissue sections from the same bronchial biopsy weTe 
probed separately for different chromosomes, the detected clones appeared to occu- 
py separate subregions of the epithelium. This result suggests that not only are the 
lungs of chronic smokers undergoing a process of genetic instability, they are expe- 
riencing the outgrowth of multiple clones throughout the exposed lung field, as pos- 
tulated by the models shown in Figures 1 and 2. One advantage of this clonal 
approach is that the contribution of both monosomic and muitisomic clones can be 
detected. 

Since smoking cessation has been suggested to reduce the lung cancer risk, it was 
of interest to determine whether the levels of chromosome instability would decrease 
following smoking cessation. This question was possible to examine because our 
third sequential chemopreveniion trial involved subjects who had discontinued 
smoking. So far, more than 220 subjects (more than 650 biopsies) who have quit 
smoking (mean 9.9 quit-years) have been evaluated for chromosome instability in 
their lungs. Despite the fact that the mean metaplasia index in this group is 5.8% 
(considerably less than that in current smokers), chromosome instability is still 
observed in the majority of subjects. 42 While the mean chromosome polysomy level 
is reduced to 1.0%, some individuals continue to show polysomy levels above 5%. 
Interestingly, while the overall chromosome polysomy levels were reduced in these 
individuals who stopped smoking, the mean chromosome index remained at about 
1 .4 with some individuals exhibiting chromosome indices as high as 1 .8. Initial chro- 
mosome mapping studies suggest that while random chromosome instability seems 
to decrease following smoking cessation, the clonal outgrowths may remain for 
many years in the lung. The working hypothesis is that those individuals who show 
the greatest degree of remaining chromosome instability are at the highest lung can- 
cer risk despite smoking cessation. Long-term follow-up on these subjects will be 
necessary to test this hypothesis. 



SUMMARY AND CONCLUSIONS 

Aerodigestive tract tumorigenesis appears to be a multistep process taking place 
throughout the tissue fields of exposure. When viewed in the context of chromosome 
changes, carcinogen exposure appears to be associated with the random acquisition 
of chromosome polysomy throughout the exposed field., the degree of which is relat- 
ed to the degree and extent of carcinogen exposure as well as to the instrinsic suscep- 
tibility of the exposed individual. Continued exposure leads to continued acquisition 
of new changes and. in association with chronic wound-healing processes, to the 
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accumulation of clonal outgrowths throughout the target tissue. Although the ulti- 
mate malignancy may occur in only one or few tissue sites, manifestations of the 
instability process that drives tumorigenesis is globally present in the tissue. Thus 
random biopsies may provide useful risk information for the exposed field as a 
whole. Even when carcinogen exposure is reduced or chemopreventive strategies are 
initiated and histologic manifestations of the tumorigenesis process subside, the 
genetic scars of prior exposure remain in the form of clonal outgrowths and may 
explain continued lung cancer risk in ex -smokers. Future chemoprevention strategies 
need to focus on reducing the degree of chromosome instability and on trying to 
eliminate residual abnormal clonal outgrowths in the aerodigestive tract. Tn this set- 
ting, the measurement of chromosome instability in the target tissue will be useful in 
assessing cancer risk as well as response to intervention. 
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Bacterial lipopolysaccharide (LPS) evokes several 
functional responses in the neutrophil that contribute 
to innate immunity. Although certain responses, such as 
adhesion and synthesis of tumor necrosis factor-a, are 
inhibited by pretreatment with an inhibitor of p38 mi- 
togen-activated protein kinase, others, such as actin as- 
sembly, are unaffected. The aim of the present study was 
to investigate the changes in neutrophil gene transcrip- 
tion and protein expression following lipopolysaccha- 
ride exposure and to establish their dependence on p38 
signaling. Microarray analysis indicated expression of 
13% of the 7070 Affymetrix gene set in nonstimulated 
neutrophils, and LPS up-regulation of 100 distinct 
genes, including cytokines and chemokines, signaling 
molecules, and regulators of transcription. Proteomic 
analysis yielded a separate list of up-regulated modula- 
tors of inflammation, signaling molecules, and cytoskel- 
etal proteins. Poor concordance between mRNA tran- 
script and protein expression changes was noted. 
Pretreatment with the p38 inhibitor SB203580 attenu- 
ated 23% of LPS-regulated genes and 18% of LPS-regu- 
lated proteins by ^40%. This study indicates that p38 
plays a selective role in regulation of neutrophil tran- 
scripts and proteins following lipopolysaccharide expo- 
sure, clarifies that several of the effects of lipopolysac- 
charide are post-transcriptional and post-translational, 
and identifies several proteins not previously reported 
to be involved in the innate immune response. 



Lipopolysaccharide (LPS), 1 a component of the outer cell wall 
of Gram-negative bacteria, evokes a variety of functional re- 
sponses in the human neutrophil (PMN) after binding to a 
plasma membrane receptor complex that involves the Toll-like 
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receptors (TLRs) (1-5). These "immediate" functional re- 
sponses, including actin assembly, adhesion, activation of nu- 
clear factor-kappa B (NF-kB), and priming for an enhanced 
secretory response and for release of reactive oxygen interme- 
diates, appear to be central both to the innate immune re- 
sponse and to the pathogenesis of several inflammatory human 
diseases, including sepsis and the acute respiratory distress 
syndrome (6). p38 mitogen-activated protein kinase (p38 
MAPK) has been shown to mediate LPS-induced PMN adhe- 
sion, NF-kB activation, and TNF-a and IL-8 translation and 
release (7), and its blockade attenuates LPS-induced PMN 
accumulation in the airspace (8). However, other cascades al- 
most certainly lead to downstream effectors of the LPS signal; 
for example, actin assembly appears to be p38 MAPK-inde- 
pendent (9). An improved understanding of the transcriptional 
and translational responses of the neutrophil to LPS and the 
modulation of these responses by p38 MAPK might carry 
pathogenetic and therapeutic implications. 

Historically, it has been believed that the downstream PMN 
transcriptional response to LPS is static and that PMN func- 
tional responses to LPS that depend on de novo protein syn- 
thesis are primarily limited to the release of cytokines (10). 
However, recent studies indicate a robust transcriptional re- 
sponse (11). To date, most studies have relied upon and re- 
ported a short list of functional assays of the LPS-exposed 
PMN; therefore, no exhaustive investigation of either the tran- 
scriptional response or protein synthetic repertoire of the PMN 
has been reported. Although several techniques have been used 
to evaluate transcripts, the screening of global changes in 
mRNA by microarray analysis has only recently become possi- 
ble. In this way, thousands of genes can be screened in an 
unbiased fashion for transcript abundance. Such genomic 
screens in mammalian cells have previously been applied to 
define altered expression profiles in response to agonists (12) 
and to drug action (13) and during cell cycle progression (14). 

Although DNA microarray technology is expected to provide 
insight into the response of the human PMN to LPS (15), 
inhibition of LPS-stimulated IL-1 and TNF-a production by 
p38 MAPK inhibitors in THP-1 cells (16) and of TNF-a synthe- 
sis in human PMNs (9) occurs at a translational level and 
would therefore not be detected by DNA microarrays. Further- 
more, in other systems, such as yeast and human liver, mRNA 
and protein levels show poor correlation (17, 18). Proteomics is 
a complementary tool for assessing global changes in cellular 
protein expression, thereby providing additional insight into 
cellular signal regulation. A proteomic approach has proven 
useful in different systems for dissecting signal transduction 
cascades and describing their output (19, 20) and has even 
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recently been used to detect novel upstream messengers in- 
volved in LPS signal transduction (21). We have applied DNA 
microarrays and proteomics to define and compare transcrip- 
tional and post-transcriptional alterations in the LPS-exposed 
PMN and to establish the dependence of these alterations on 
p38 MAPK signaling. 

EXPERIMENTAL PROCEDURES 

Materials — Endotoxin-free reagents and plastics were used in all 
experiments. Aprotinin, leupeptin, AEBSF, E-64, pepstatin, and besta- 
tin protease inhibitors, spermine HC1, and a-cyano-4-hydroxycinnamic 
acid (CHCA) were all purchased from Sigma Chemical Co, (St. Louis, 
MO). SB203580, a p38 MAPK inhibitor, was purchased from Calbio- 
chem-Novabiochem Corp. (San Diego, CA). For two-dimensional PAGE, 
rehydration buffer, equilibration buffers, vertical electrophoresis solu- 
tions, and 10% homogeneous poly aery 1 amide slab gels were purchased 
from Genomic Solutions, Inc. (GSI, Ann Arbor, MI). Sequencing grade 
porcine trypsin was purchased from Promega (Madison, WI). 

LPS Incubation— PMNs were isolated by the plasma Percoll method 
(22), a technique that yields less than 5% monocytic contamination, and 
resuspended at a concentration of 15.4 X 10 6 /ml in RPMI 1640 culture 
medium (Bio Whit taker, Walkersville, MD) supplemented with 10 mM 
HEPES (pH 7.6) and 1% heat-inactivated platelet-poor plasma. After 
addition of 100 ng/ml Escherichia coli 0111:B4 LPS (List Biological), 
incubation was carried out with continuous rotation (4 h, 37 °C) both in 
the presence and absence of SB203580. Both Affymetrix analysis and 
proteomic analysis utilized 75 X 10 s cells. For microarray analysis, 
nonstimulated and 4-h-treated PMNs were collected from three sepa- 
rate donors. A more detailed time course following LPS exposure was 
performed using polymerase chain reaction. For proteomic analysis, 
LPS incubations from separate donors (n = 6) were performed and then 
analyzed individually. Control and post-LPS incubation PMNs were 
washed (0.34 M sucrose/1 mM EDTA/10 mM Tris) and then lysed in a 
modified rehydration buffer (GSI, Ann Arbor, MI) supplemented with 2 
M thiourea, 50 mM dithiothreitol (DTD, 22,5 mM spermine HC1, and a 
mixture of six protease inhibitors (10 fig/ml aprotinin, 10 |Lig/ml leupep- 
tin, 2 mM AEBSF, 5 pLM E-64, 1 julm pepstatin, 10 jim bestatin). DNA was 
pelleted by centri (ligation at 250,000 X g for 60 min (23). 

Affymetrix Oligonucleotide Array— Five micrograms of total RNA was 
isolated with TRIzol (Invitrogen) and RNeasy columns (Qiagen) and 
subsequently labeled with biotin as described by Affymetrix. Briefly, 
first-strand synthesis was accomplished with Superscript II reverse tran- 
scriptase (Invitrogen) using a TT-oligcKdT)^ primer for 1 h at 42 °C 
followed by second-strand synthesis using E. coli DNA polymerase I and 
RNase H (Invitrogen) at 16 °C for 2 h. Double-stranded DNA was used as 
a template for in vitro transcription with T7 RNA polymerase in the 
presence of biotin-labeled UTP and CTP using the BioArray High Yield 
RNA transcript labeling kit (Enzo). Fifteen micrograms of cRNA was 
fragmented and used for hybridization to Affymetrix HuGene 6800FL 
Genechips. Each sample was hybridized initially using a Test2 Genechip 
to test for sample degradation and full-length in vitro translation. Data 
were analyzed using Affymetrix Genechip software. Results from three 
separate donors were analyzed. 

Reverse Transcription and Polymerase Chain Reaction — cDNA was 
prepared by reverse transcription using 2 fxg total RNA, derived from 
20 x 10 6 cells that were treated as indicated. Polymerase chain reac- 
tions were performed using specific primers for Mx-1, TNF-a t MCP-1, 
p65 t S100A4, and glyceraldehyde-3 -phosphate dehydrogenase. 

Two-dimensional PAGE — The protein concentration of the lysates 
was measured as described by Bradford et at (24). Poor isoelectric 
focusing (IEF) results were encountered unless the polycationic sperm- 
ine was diluted (data not shown); therefore, lysates were diluted with 
rehydration buffer (GSI, Ann Arbor, MI) to achieve a final spermine 
concentration of 6 mM. Equal protein loads (1.5 mg) of control and 
LPS-stimulated neutrophils were used to rehydrate IEF gels overnight 
(18 cm, pH 3-10 nonlinear Immobiline DryStrip IEF gels, Amersham 
Biosciences; Piscataway, NJ). IEF was performed at 20 °C to 100-kVh 
(Phaser, GSI) under mineral oil, followed by two 10-min SDS equilibra- 
tion steps (DTT and then iodoacetamide-containing equilibration buff- 
ers, GSI) and then by vertical electrophoresis on 10% homogeneous 
polyacrylamide slab gels (GSI) at 500 V. Protein spots were visualized 
by agitation in colloidal Coomassie Brilliant Blue G-250 (16 h) (25), 
followed by destaining in deionized water (20 h). In separate experi- 
ments, control and LPS-stimulated PMN lysates from three donors 
were pooled and then analyzed by two-dimensional PAGE using over- 
lapping narrow isoelectric point (pi) ranges (18 cm, pH 5.0-6.0, 5.5- 



6.7, and 6-11, Amersham Biosciences, Piscataway, NJ). Identical IEF 
and vertical electrophoresis parameters were used for all gels. 

Image Analysis of Two-dimensional Gels — Colloidal Coomassie- 
stained gels were digitized using a Powerlook II (UMAX Data Systems, 
Inc., Taiwan) flatbed scanner with 8-bit dynamic range and 150-dpi 
resolution. Biolmage (GSI, Ann Arbor, MI) 2D-Analyzer software was 
used to locate, quantitate, and match protein spots on the control and 
LPS gel images. Analysis was performed by assigning 50 common 
anchor spots between paired images; the remaining spots were com- 
pared by a constellation -matching algorithm. All data were then care- 
fully reviewed by the operator to account for any discrepancies. Protein 
loading between control and experimental gels may have varied be- 
cause of inconsistencies in rehydration of the different IEF gel strips; 
therefore, gel images were normalized so that the sum of the integrated 
intensities of all matched spots on paired gels was made equal. Control 
and LPS-stimulated gel images from individual donor experiments 
were matched to generate composite images; composite images were 
then matched into a master composite image to track the LPS response 
of protein spots among different donors (26). Only those spots that were 
common (image-matched) to all original 12 (pH 3.0-10,0) gels were 
considered for further analysis. For these spots, the LPS-induced 
change in integrated intensity in the six experiments was subjected to 
statistical analysis with a two-tailed Student's t test, and those spots 
with p < 0.05 were identified by peptide mass fingerprinting (described 
below). For the narrow range (pH 5.0-6.0, 5.5-6.7, and 6-11) two- 
dimensional PAGE experiments using pooled donors, only those spots 
with concordant regulation exceeding 1.5-fold or that appeared de novo 
in the LPS gel in two repeat experiments were further analyzed, 

In-gel Tryptic Digestion — In-gel digestion of protein spots was per- 
formed with sequencing grade porcine-modified trypsin using the 
method of Hellman et at. (27). Tryptic peptides were then extracted (50 
jxl of 50% acetonitrile/5% trifluoro acetic acid, 2 h), and the supernatant 
was taken to dryness in a vacuum centrifuge and then redissolved in 
trifluoroacetic acid (20 jul, 0.5%). Peptides were then purified and con- 
centrated using ZipTip cl8 pipette tips (Millipore, Bedford, MA). 

MALDI-TOF Mass Spectrometry — Analyses were performed on an 
Applied Biosystems matrix-assisted laser desorption ionization time-of- 
flight (MALDI-TOF) Voyager-DE PRO mass spectrometer (Framing- 
ham, MA) operated in delayed extraction mode. Samples (0.5 ^,1) were 
spotted onto a sample plate to which matrix (0.5 yX of 10 mg/ml CHCA) 
was added. The sample-matrix mixture was dried at room temperature 
and then analyzed in reflector mode. CHCA was also spotted alone as a 
negative control. Spectra were the sum of 100 laser shots, and those 
peaks with a signal-to-noise ratio of greater than 3: 1 were selected for 
data base searching. Spectra were internally calibrated using autolytic 
trypsin peptides {mJz 842.51, 2211.10). 

Data Base Searching Algorithm — The monoisotopic masses for each 
protonated peptide were: (a) entered into the program MS-Fit (available 
at prospector.ucsf.edu) for searches against the Swiss-Prot, NCBI, 
and GenPept databases, and (6) entered into Mascot (available at 
matrixscience.com), an algorithm testing statistical significance of pep- 
tide mass fingerprinting identifications.' For MS-Fit searches, masses 
derived from trypsin, CHCA, keratin, and Coomassie Brilliant Blue 
G-250 were excluded. Search parameters included a maximum allowed 
peptide mass error of 0.1 Da (0.8 Da in the few instances in which linear 
mode was used), consideration of one incomplete cleavage per peptide, 
pi range of 3.0-10.0, and molecular mass range of 1-200 kDa. Accepted 
modifications included carbamidomethylation of cysteine residues 
(from iodoacetamide exposure following IEF) (28) and methionine oxi- 
dation, a common modification occurring during SDS-PAGE (29). Pro- 
tein identifications were assigned when three criteria were met: 1) 
statistical significance (p < 0.05) of the match when tested by Mascot 
(matrixscience.com); 2) >20% sequence coverage by the tryptic pep- 
tides; and 3) concordance (±15%) with the molecular weight and pi of 
the parent two-dimensional PAGE protein spot. The following special 
exceptions were considered: (a) protein identifications not fulfilling 
criterion 2 were still assigned if criteria 1 and 3 were fulfilled and no 
other Homo sapiens proteins with peptide mass -matched p values < 
0.05 were identified by Mascot; (6) if criterion 3 was not fulfilled (lower 
than expected molecular weight), a cleavage product of the identified 
protein was inferred, and the cumulative molecular weight of the tryptic 
peptides was compared with that of the two-dimensional-PAGE spot to 
ensure that it was not exceeded; (c) if criterion 3 was not fulfilled (isolated 
discordance between theoretical and observed pi), post-translational mod- 
ification of an unrecovered peptide was inferred; and (d) if two or more H. 
sapiens protein assignments with > 4 mutually exclusive matching pep- 
tides were identified, a protein mixture in the two-dimensional PAGE 
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spot was inferred and further analysis halted (quantitative conclusions 
regarding the individual protein constituents could not be drawn). 

RESULTS 

Genes Differentially Expressed in LPS -stimulated Neutro- 
phils — Human PMNs were left untreated or incubated in the 
presence of 100 ng/ml LPS for 4 h. As a control to confirm that 
the PMNs were quiescent at baseline and that LPS resulted in 
normal stimulation, mRNA was isolated, cDNA was prepared, 
and PCR for TNF-a was performed. Little TNF-a expression 
was seen in nonstimulated cells, whereas LPS treatment led to 
an increase in expression in each of the donors subsequently 
used for microarray analysis (data not shown). No macrophage- 
colony stimulating factor receptor transcript was detected by 
oligonucleotide microarray analysis, confirming there was no 
significant monocytic contamination. 

Human PMNs express a limited repertoire of mRNA tran- 
scripts at baseline but respond to LPS with differential expres- 
sion of genes in many families. Considering only those genes 
present by microarray analysis in all three donors, unstimu- 
lated PMNs expressed 13.0% (923 of 7070 genes) of the Af- 
fymetrix gene set. Gene classes represented at baseline include 
metabolic enzymes, structural proteins, receptors, signaling 
proteins, and transcription factors. By comparison, human 
monocytes expressed —40% and human fibroblasts ~35% of the 
represented genes (data not shown). By the criterion of a >3- 
fold increase in expression in all three donors on Affymetrix 
oligonucleotide array analysis, exposure of PMNs to LPS for 4 h 
resulted in the up-regulation of 100 genes (Table I). 

Genes from several different functional classes were induced 
in PMNs following LPS exposure. Of interest, a number of 
transcriptional regulators were induced, including transcrip- 
tion factors of the NF-kB family. The transcriptional NF-kB 
complex has previously been implicated in the regulation of the 
genes induced by LPS (11). The genes for several cytokines and 
chemokines were also found to be up-regulated. These include 
TNF-a, IL-lp, IL-6, MCP-1, MIP-3a y and MIP-lp (Table I). 
PCR was performed to confirm the results from the microarray 
analysis. PCR analysis on selected genes indicates that the 
time course for changes can be rapid or delayed but parallel the 
changes found in the array at the 4-h time point (data not 
shown). Other up-regulated genes included those for metabolic 
enzymes, immune response molecules, kinases, phosphatases, 
signaling molecules, adhesion and cytoskeletal components, 
interferon-stimulated genes, and those with unknown or mis- 
cellaneous function (Table I). 

LPS stimulation of PMN also resulted in the down-regula- 
tion of 56 genes (Table II). Down-regulated genes were identi- 
fied as transcriptional regulators, protein and lipid kinases and 
phosphatases, structural molecules, and signaling molecules. 
Genes for metabolic proteins were also evident, as were several 
uncharacterized genes. 

Two-dimensional PAGE and Image Analysis — In contrast to 
the limited number of transcripts found at baseline, PMNs 
were found to express a large number and variety of proteins in 
the nonstimulated state (Fig. 1, A and C, and Tables HI-V). 
Reproducible protein expression patterns were found on the pH 
3.0-10.0 gels, and the majority of proteins fell in the pH 5.0- 
7.0 range (Fig. LA). The basic region (pH > 7.0) consistently 
exhibited poor resolution, precluding meaningful image analy- 
sis and further workup (data not shown). Depending on the 
spot-finding parameters (minimum spot intensity, filter width) 
selected on the image analysis software, spot-by-spot manual 
editing was found to be necessary to avoid over- and underde- 
tected spots; moreover, further manual editing was performed 
to screen for unmatched and mismatched spots following 
matching of paired control and LPS-stimulated gels. After spot 



editing, —1200 well-resolved spots were evident on each pH 
3.0-10.0 gel. In an attempt to improve resolution of the pi 
range bearing the greatest number of well-resolved spots, over- 
lapping narrow pH range gels (pH 5.0-6.0, 5.5-6.7, 6-11) were 
also run. Of interest, a similar number of well-resolved spots 
(-1200) were detected on the narrow pH range gels (Fig. 1, C 
and D). Assuming a detection limit for Coomassie of 15 ng (0.25 
pmol, or 1.5 X 10 11 molecules, for a 60-kDa protein) and a 
protein load per gel corresponding to 75 X 10 6 PMNs, we 
estimate a detection limit on our gels of 2000 molecules/cell for 
a 60-kDa protein. As investigators have suggested in other cell 
lines with the use of high resolution two-dimensional-PAGE 
methods (30), we estimate that > 10,000 proteins are expressed 
in the resting PMN. 

Human PMNs respond to LPS with the differential expres- 
sion of a large number of proteins. In the six individual pH 
3.0-10.0 experiments, the number of protein spots that in- 
creased in integrated intensity by at least 50% following LPS 
exposure was 185, 122, 104, 104, 96, and 131, respectively. The 
number of protein spots that decreased by at least 50% follow- 
ing LPS exposure was 72, 151, 102, 98, 128, and 97, respec- 
tively. Although gel-to-gel regional variability in resolution was 
expected to account for individual spots not being well visual- 
ized on particular gels, only those spots that were matched to 
all 12 original gels were analyzed further. Overall, the number 
of spots matched to all 12 original gels was 125. The numbers 
of spots that were both matched to all 12 original gels and that 
increased by at least 50% in integrated intensity in the indi- 
vidual experiments following LPS exposure were 46, 13, 17, 27, 
22, and 20, respectively. The numbers of spots that were 
matched to all 12 gels and that decreased by at least 50% were 
6, 22, 17, 22, 34, and 28, respectively. The LPS-induced change 
in integrated intensity of the 125 spots that were matched to all 
12 original gels was subjected to statistical analysis with a 
two-tailed Student's t test, and those spots with statistically 
significant (p < 0.05) regulation among the six experiments 
were identified by peptide mass fingerprinting (Table III). 

Identification of LPS-regulated Proteins— Several proteins 
were consistently up-regulated on the pH 3.0-10.0 gels (Table 
III), including regulators of inflammation (annexin III) and 
signaling molecules (Rab-GDP dissociation inhibitor )3). Sev- 
eral actin fragments were seen to be consistently up-regulated 
in the six experiments following LPS exposure (Table III). Of 
interest, the proteasome )3 chain was also consistently up- 
regulated. Down-regulated proteins included other signaling 
molecules, such as Rho GTPase activating protein 1. 

On the pH 5.0-6.0 and 5.5-6.7 gels, several proteins were 
found to show increases of greater than 1.5-fold following LPS 
exposure (Tables IV and V), including cytoskeletal proteins, 
such as moesin, nonmuscle myosin heavy chain, and a putative 
phosphorylated form of nonmuscle myosin heavy chain, and 
signaling molecules, such as protein phosphatase 1 and P0 4 - 
stathmin. The putative phosphorylated form of nonmuscle my- 
osin heavy chain (spot #1101) was positioned 0.03 pH unit more 
acidic than the unmodified protein (spot #1102) (Fig. ID) and 
was distinguished by a tryptic peptide (m/z 1366.74) not pres- 
ent in the unmodified protein, consistent with phosphorylation 
of serine 685. Serine 685 is predicted by NetPhos 2.0 Prediction 
Server (available at www.cbs.dtu.dk/services/NetPhos/(3D) to 
be a high probability phosphorylation residue and by Scan- 
Prosite (www.expasy.ch/tools/scnpsite.html) to be a substrate 
for protein kinase C. The tryptic phosphopeptide identified in 
P0 4 -stathmin, extending from residues 15 to 27 (1468.7 Da), is 
consistent with phosphorylation of either serine 16, a known 
substrate for Ca 2+ /calmodulin (CaM)-dependent kinases (32), 
or serine 25, a known substrate for p385 and ERK (Fig. 2A) 
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Table I 

Human neutrophil genes induced after 4 h of LPS exposure 



Description GenBank™ no. Change-fold 



Transcriptional regulation 

Pleomorphic adenoma gene-like 2 D83784 16.8 

NFKB2 S76638 12.3 

NFKBIE U91616 11-5 

P 65 L19067 8.4 

BCL3 U05681 7.7 

X-box binding protein 1 M31627 7.5 

Metal-regulatory transcription factor 1 X78710 7.4 

Ets-2 J04102 7.4 

c.Rel X75042 6.2 

NFKB1 M58603 5.8 

Basic leucine zipper transcription factor, ATF-like U15460 4.7 

1KB M69043 3.8 

MAX dimerization protein L06895 3.6 

DIF2 S81914 3.1 

Cytokines and receptors 

MCP-1 M69203 78.7 

MIP-1& M72885 48.8 

aHelix coiled-coil rod homolog AF014958 20.8 

IL-ip X04500 17.6 

GR03(beta) M57731 17.3 

TNF-a X02910 14,5 

MIP-3a U64197 8.1 

IL10RA U00672 7.3 

IL-6 Y00081 6.3 

GROa X54489 4 

HM74 D10923 3.8 



Immune response 

Orosomucoid X02544 20.2 

Complement component C3 K02765 12.8 

Protease inhibitor 9 U71364 9.5 

Complement component 3a receptor 1 U28488 6.1 

Protease inhibitor 3 L10343 4.9 

SLP//antileukoprotease X04470 4.7 

ELANH2/e\astase inhibitor M93056 4.6 

CD58 Y00636 3.8 

Complement component PFC M83652 3.5 

Kinases 

CNK/FNKIPLK-like U56998 16.2 

Cot D14497 11.9 

Pim-2 U77735 9.5 

LIMK2 D45906 4.3 

Phosphatases 

PAC-1IDUSP2 L11329 11.8 

DUSP5 U15932 5.3 

PHA1 U73477 3.4 

Signaling molecules 

TNFAIP1/A20 M59465 10 

TRAF1 U19261 6.2 

RanBP2 D42063 5.6 

GNAW M63904 5.2 

PTAFR D10202 3.9 

Adhesion and cytoskeleton 

ICAM1 M24283 22.4 

CEACAM1 (bilary glycoprotein) X16354 6.3 

L1MS1 U09284 6.1 

SiVL/actin bundling protein U03057 5.9 

Galectin-liLGALSl M57710 4.7 

MEMD IALCAM ' U30999 4.2 



CD44 HG2981— HT3125 3.9 



TSG-6 M31165 3.7 
Metabolic 

GTP cyclohydrolase I U19523 13,5 

AWFV2/ubiquinone reductase M22538 8.6 

PSMA6/(proteosome iota) X59417 8.4 

UDP-galactose transporter (SLC35A2) D84454 7.3 

PLAU (urokinase) X02419 6.4 

AYATty/L-kynurenine hydrolase U57721 5.5 

AMPD3 D12775 5 

P4HAlfproly\ 4-hydroxylase M24486 4.7 

y Glutamylcysteine synthetase L35546 4.5 

ATP6D J05682 4.2 

ATP6S1 D16469 4 
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Table I — continued 



Description GenBank™ no. Change-fold 



Glycerol kinase X68285 3.6 

FACL1 L09229 3.5 

AK3 X60673 3.3 

Interferon-inducible 

ISG15 M13755 22.5 

Mxl M33882 19.4 

JFI56 M24594 12.1 

1NDO M34455 5.2 

GBPI M55542 4.3 

PRKR U50648 3.7 

IFIT4 U52513 3.6 

IF154' M14660 3.5 

IFI58 U34605 3.5 

IFP35 U72882 3 

Other 

Gos2 M72885 48.8 

MIHC/IAP1 U37546 7.2 

KIAA0105 D14661 5.1 

KIAA0118 D42087 5 

SNAP23 U55936 5 

CASP5 U28015 4.8 

iOAA02i3 D30755 4.8 

KIAA0255 D87444 4.7 

Hepatoma-derived GF D16431 4.7 

PTGS2 D28235 4.6 

CD48 M37766 4.3 

UNCI 19 homolog U40998 4.2 

KIAA0151 D63485 3.9 

fla&ifc XM035660 3.8 

AnnexinVU J04543 3.7 

K1AA0110 D14811 3.7 

Adrenomedullin D14874 3.7 

AIM! U83115 3.6 

D87437 3.2 

P5-1 L06175 3.2 

Scavenger receptor expressed by endothelial cells D63483 3.2 

VHL L15409 3.1 



(33). Assuming that no other multiply phosphorylated stath- 
min species had escaped detection, analysis of the integrated 
intensities of the P0 4 -stathmin and stathmin spots indicates 
that the percentage of the P0 4 form of total cellular stathmin 
increased from 11% to 38% with LPS stimulation (Fig. 2B). 
This is similar to a previous report of an increase from <10% to 
35-40% of the Ser 25 -phosphorylated form in Jurkat cells stim- 
ulated with anti-CD3 (34). 

Effect of SB203580 on LPS -stimulated Gene Expression- 
Gene expression analysis of PMNs stimulated with LPS indi- 
cated that the majority of genes induced by LPS were unaf- 
fected by prior treatment of PMN with SB203580. Of the 100 
genes up-regulated by LPS, the up-regulation of 23 was inhib- 
ited by greater than 40% (Table VI). The majority of these 
genes affected by SB203580 were inhibited by less than 60%, 
whereas only six were inhibited by greater than 80%, all of 
which represent previously identified interferon-stimulated 
genes. Induction of cytokine genes by LPS, with the exception 
of IL-6, was generally unaffected by SB203580. 

Effect of SB203580 on LPS -stimulated Protein Expression — 
Similar to the effect of SB203580 on LPS-stimulated gene 
expression, little effect of SB203580 was seen on expression 
levels for the majority of LPS-regulated proteins (Table VII). 
Two exceptions are annexin III and a-enolase ( for which LPS- 
stimulated expression was attenuated in the presence of the 
p38 MAPK inhibitor. 

Comparison of Microarray and Proteomics Results — Of the 
LPS-regulated proteins identified by peptide mass fingerprint- 
ing for which probes were present on the oligonucleotide mi- 
croarray, poor concordance was found at the mRNA level (Table 
VIII). For 13 LPS-up-regulated proteins, 2 corresponding 



mRNA transcripts were up-regulated, 1 was down-regulated, 5 
were unchanged, and 5 were not detected by the AiFymetrix 
chip. For 5 down-regulated proteins, 3 corresponding tran- 
scripts were down-regulated, 1 was unchanged, and 1 was not 
detected. Varying patterns of LPS regulation emerge for those 
candidates detected at both the transcript and protein level. 
Proteasome /3 chain was up-regulated at both the transcript 
and protein levels (Table VIII), with no notable effect of 
SB203580 on expression at either level. Similarly, CAP1, Rho- 
GAP1, and ficolin 1 were down-regulated at both the mRNA 
transcript and protein level (Table VIII), with no notable effect 
of SB203580. Annexin III was down-regulated at the transcript 
level and up-regulated at the protein level, with an inhibitory 
effect of SB203580 seen only at the protein level (Tables VII 
and VIII). 

DISCUSSION 

Interaction of bacterial LPS with the human PMN repre- 
sents a model system for studying the activation and output of 
the innate immune system during infection and inflammation. 
A recent publication (35) describes the gene expression changes 
of a cultured monocytic cell line after infection by the Gram- 
positive bacterium Listeria monocytogenes. The cell wall com- 
ponents of Gram-positive bacteria, like Gram-negative-derived 
LPS (i.e. from E. co/i), are known to signal through TLRs (36, 
37). Importantly, many of the expression changes found in 
LPS-stimulated PMNs in the present study were also described 
in the bacteria-exposed monocytic cells, indicating that many of 
the gene expression changes seen in bacterial infection are 
likely mediated by TLRs (38, 39) and that the LPS model 
system accurately reflects exposure of immune cells to infec- 
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Table II 

Human neutrophil genes repressed (>4-fold) after 4 h of LPS exposure 



Description 



GenBank™ no. 



Change 



Kinases 
CAMK, II, gamma 
Diacylglycerol kinase, delta 
PRKCL2fPHK2 protein kinase C-like 2 
MAPKAPK3 

Protein kinase Ht31, cAMP-dependent 
CAMK II 

Transporters 
SLC25A5/so\ute carrier family 25, member 5 
SLC19A1; folate transporter 
SLC2A3; facilitated glucose transporter 

Metabolic 
Carbonic anhydrase IV 
RNase A family, k6 
Glycogen phosphorylase; liver 
Inositol polyphosphate-5-phosphatase 
Inositol 1,3,4-trisphosphate 5/6-kinase 
Transketolase 

Protein phosphatase 4, reg. subunit 1 (clone 23840) 

Cytidine deaminase 

MGAT1 

HMOX1 

MAN2A2 

Glycogenin (also represents U31525) 

Structural 
Fibrinogen-like protein (pT49 protein) 
H2AFZ 
Paxillin 
Lamin B R 
Dynamin 2 
Actinin 1 
a-Tubulin 

Tubulin, al, isoform 44 

Transcriptional regulators 
Lymphoblastic leukemia-derived sequence 1 
MAX-interacting protein 1 
Nuclear factor crythroid 2 isoform f 
Transducer of ERBB2, 1 
NFATC4 

ATF-2 (CRE-Bpa) 

Receptors 
Lymphotoxin (S receptor 
Folate receptor 3 (gamma) 



U50360 
D63479 
U33052 
U09578 

HG2167-HT2237 
L07044 

J02683 
U17566 
M20681 



L10955 
U64998 
M14636 
U57650 
U51336 
L12711 
U79267 
L27943 
M55621 
X06985 
L28821 

HG4334-HT4604 



Z36531 
M37583 
U14588 
L25931 
L36983 
M95178 
X01703 

HG2259-HT2348 



M22638 
L07648 
S77763 
D38305 
L41067 
L05515 



L04270 
U08471 
U11875 



■fold 

-4 

-4.2 

-4.3 

-6.3 

-8 

-9.8 



-4.2 
-4.4 
-5 



-4.4 
-4.5 
-4.6 
-4.6 
-4.7 
-4.8 
-4.9 
-5.4 
-5.4 
-5.4 
-5.8 
-5.9 



-4.2 
-4.7 
-4.9 
-5.9 
-6.2 
-6.7 

-10 

-15 



-4.4- 

-4.5 

-6 

-6.9 

-7.8 

-9.6 



-4.4 

-5 

-5.3 



Signaling 

Pix a; cool-2 (KIAA0006) D25304 -4.5 

ARHB/KhoB M12174 -4.5 

TNFSF10; TRAIL U37518 -6.6 

Ca 2+ binding 

ANXII L19605 -4.3 

S100A4 M80563 - 4.8 

ANX1 X05908 -4.8 



Other 
Proteolipid protein 2 

Protein phosphatase 1, a catalytic subunit 

TIMP2 

K1AA0199 

Lipin 2 (KIAA0249) 

LRMP (Jawl) 

CUGBP2 

Clone 23933 

PECAM1 

Delta sleep-inducing peptide 

DiGeorge synd. critical region gene 2 (KIAA0163) 

SELPLG; CD162; selectin P ligand 



L09604 


-4.9 


HG1614-HT1614 


-5 


M32304 


-5.1 


D83782 


-5.2 


D87436 


-5.6 


U10485 


-5.8 


U69546 


-6.9 


U79273 


-7 


L34657 


-8 


Z50781 


. -8.7 


D79985 


-9 


U25956 


-32 



tion. Nevertheless, the reliance upon DNA microarrays alone 
affords insight only into the transcriptional response without 
corroboration at the protein level. In the present study, appli- 



cation of both DNA microarray and proteomics technology to 
our model system provides unique insight into both the cellular 
biology of the activated PMN and the responsiveness and reg- 
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5.0 _ 6.0 5.0 



Fig. 1. Two-dimensional PAGE of LPS-exposed human PMNs. 

A and B, colloidal Coomassie Blue-stained pH 3.0-10.0, two-dimen- 
sional PAGE gels (A, control; B, LPS-exposed) with up-regulated (solid 
arrows) and down-regulated (hatched arrows) proteins indicated. These 
results are representative of six separate experiments. C and D, colloi- 
dal Coomassie Blue-stained pH 5.0-6.0, two-dimensional PAGE gels 
(C, control; D, LPS-exposed) with up-regulated (solid arrows), new 
(solid arrow, open arrowhead), and down-regulated (hatched arrows) 
proteins indicated. LPS-exposed PMNs from three blood donors were 
pooled. 

ulation of its transcriptional and translational machinery. As 
will be discussed below, our study identifies, in particular, 
novel aspects of the LPS-stimulated PMN transcriptional reg- 
ulation, activity in the innate immune response, signaling, 
cytoskeletal reorganization, and priming for granule release. 
' In the present study, the increase in NF-kB transcript abun- 
dance (Table I) detected by the microarrays corroborates the 
findings of other studies of PMNs and monocytes (40) and 
indicates a mechanism for the responsiveness and scope of the 
PMN transcriptional machinery following LPS exposure. NF- 
kB, recently described to be activated by LPS through the 
TLR/MyD88/interleukin-l receptor-associated kinase pathway 
(1, 4), is the only transcriptional complex reported to be in- 
duced by LPS in the PMN. However, because the transcrip- 
tional NF-kB complex has been implicated in the regulation of 
only a portion of the genes induced by LPS in this study (data 
not shown), the importance of alternative transcriptional reg- 
ulators in the PMN is clear. Of interest, several other known 
and putative transcriptional regulators with less well defined 
functions were also up-regulated in the present study, includ- 
ing PLAGL2, a putative zinc-finger protein, XBP-1, MTF-1, 
Ets-2, B-ATF, and DIF-2. On the other hand, LPS-down-regu- 
lated genes include ATF-2 (a known target of p38), NFATC4, 
TOB-1, NF-E2, MXI-1, and LYL-1. Although the exact role of 
these gene products in regulating cell function is unknown, 



these data indicate that the range of transcriptional responses 
in the LPS-stimulated PMN is much broader than previously 
suggested and that the signaling capabilities of the PMN in the 
immune response are thereby likely extended in scope and 
specificity. 

As expected from the literature, the genes for several cyto- 
kines and chemokines, including IL-lfrIL-6, and M/P-i j3, were 
found to be up-regulated (Table I). On the other hand, the 
notable absence of up-regulated cytokines in the proteomics 
experiments reflects their removal in the post-LPS incubation 
wash performed prior to lysis for two-dimensional-PAGE. Up- 
regulation of these inflammatory mediators is well documented 
in PMNs exposed to LPS and in animal models of LPS-induced 
sepsis syndrome and acute respiratory distress syndrome, a 
PMN-mediated illness (41, 42). Several genes in this family 
were up-regulated that have not, to our knowledge, been de- 
scribed in LPS-stimulated cells, including MCP-1, GR03, 
IL-10RA, and HM74, an orphan G protein-coupled receptor 
with homology to chemokine receptors. The down-regulation of 
TNFSF10, lymphotoxin b receptor, and TNFAIP1 were also 
observed. The modulation of genes involved in cytokine signal- 
ing, including the adapter molecules TRAF1 (LPS and TNF 
receptor signaling) and TNFAJP1 (TNF receptor signaling) and 
several kinases and phosphatases, may indicate a change in 
cytokine responsiveness after LPS treatment. Relevant in this 
regard from the proteomics data are: 1) the up-regulation of 
protein phosphatase 1, which has been shown to regulate PMN 
NADPH oxidase activation and translocation (43, 44) and to 
regulate LPS-induced NF-kB activation (45); 2) the down-reg- 
ulation of Rho-GAPl, which has been shown to regulate 
NADPH oxidase activity in the PMN (46); and 3) the up- 
regulation of P0 4 -stathmin (Table IV), a phosphoprotein pos- 
tulated to function as a relayer and integrator of multiple 
signal transduction pathways (34). Several noncytokine, 
nonchemokine genes involved in the immune response were 
also up-regulated, including the complement pathway mem- 
bers C3, C3AR1, and PFC; the protease inhibitors ELANH2 
(elastase inhibitor), SLPI, PI-3, and PI-9; and the acute phase 
protein orosomucoid. LPS regulation of C3AR1 and orosomu- 
coid expression have not previously been reported. In the pro- 
teomics experiments, the down-regulation of ficolin-1 (Table III), 
a collectin-like cell surface protein reported to activate the com- 
plement system and to mediate adhesion and phagocytosis in 
monocytes but not previously reported in granulocytes (47), may 
represent negative modulation of the innate immune response. 
The finding that genes other than cytokines and chemokines are 
regulated by the PMN in response to LPS indicates that the PMN 
plays a more sophisticated role in host-defense and immunity 
than previously thought. 

Treatment of the PMN with LPS lead to the induction of a set 
of genes associated with the anti-viral Type I interferons, 
IFNa/0. This induction occurs independently of the release of 
IFN or another unidentified soluble factor. 2 Furthermore, the 
set of genes expressed is smaller than that induced by IFNoVj3, 
as described by Der et al (12). This may be due to differences in 
the scope of the signaling systems activated by LPS and 
IFNa/j3, or the time course of analysis of genes in the LPS- 
stimulated PMN. The implication that LPS treatment of PMN 
allows PMN to express anti-viral activity is currently being 
tested. Of interest was the finding that induction of interferon- 
stimulated genes was blocked by pretreatment of PMNs with 
SB203580. Work from our laboratory has indicated that signal 
transducers and activators of transcription activation does not 
occur in response to LPS in PMNs. 2 In addition, interferon- 



2 K. C. Malcolm and G. S. Worthen, manuscript in preparation. 
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Table III 

Analysis ofpH 3.0-10.0 two-dimensional PAGE gels 
Mean change(-fold) in expression level among six PMN donors is reported. The change in expression for the proteins listed was statistically 
significant (p < 0.05) as measured by a two- tailed Student's t test. 



Identification [spot no.] 



Swiss-Prot no. 



Estimated 
M R /pI 



Theoretical 
Af R /pI 



Peptides matched/ 
submitted 



Protein 
covered 



Mean 
change 



-fold 



Up-regulated 
Proteasome j3 chain [646] 
Annexin III [550] 
Actin fragment [544] a 
Actin fragment [591]° 
a-Enolase [380] 

Rab-GDP dissociation inhibitor f3 [289] 
Glutathione S -transferase P [648] 
Pre-B-cell colony enhancing factor [1152] 

Down-regulated 
Adenylyl cyclase-associated protein 1 [256] 
Rho-GAPl [283] 
Ficolin 1 [511] 



P28070 


27/5.7 


29.2/5.72 


9/12 (75%) 


36% 


1.51 


P12429 


31/5.7 


36.4/5.6 


14/18 (78%) 


42% 


1.37 


P02570 


32/5.5 


(41.7/5.29) 


13/15 (87%) 


(34%) 


1.74 


P02570 


30/5:4 


(41.7/5.29) 


14/18 (78%) 


(29%) 


1.60 


P06733 


41/5.7 


47.2/7.01 


9/10 (90%) 


24% 


1.65 


P50395 


50/6.1 


50.7/6.11 


10/11 (91%) 


25% 


1.24 


P09211 


23/5.5 


23.4/5.43 


6/8 (75%) 


41% 


1.54 


P43490 


53/7.0 


55.5/6.69 


12/16 (75%) 


25% 


1.29 


Q01518 


55/7.3 


51.7/8.07 


16/22 (73%) 


34% 


0.53 


Q07960 


50/5.8 


50,4/5.85 


7/9 (78%) 


22% 


0.67 


000602 


33/6.5 


35/6.39 


10/12 (83%) 


25% 


0.74 



a The theoretical pi and M R of native actin are indicated. Protein coverage indicates coverage of native actin. 

Table IV 

Analysis of pH 5.0-6.0 two-dimensional PAGE gels 
Results are from pooled samples for control (n=3) and LPS-exposed (n = 3) PMNs from human donors. Expression of the reported proteins was 
altered > 1.5-fold following LPS exposure in two repeat experiments. "New" designates proteins seen in the LPS gel in two repeat experiments but 
not detectable in the corresponding control gels. 



Identification [spot no.] 


Swiss-Prot 
no. 


Estimated 
M B /pl 


Theoretical 
Mr/pI 


Peptides matched/ 
submitted 


Protein 
covered 


Change 












% 




% 


-fold 


Up-regulated 










10/14 




34% 


1.8 


Protein -tyrosine kinase 9-like [468] 


Q9Y3F5 a 


34/5 


.81 


39.5/6.37 


(71%) 


Protein phosphatase 1, catalytic subunit, /3 isoform 


P37140 


38/5 


.73 


37.2/5.84 


7/10 


(70%) 


22% 


2.0 


[378] 


















PCVstathmin [577] 


P16949 6 


18/5 


.36 


17.3/5.76 


9/12 


(75%) 


42% 


2.1* 


Nonmuscle myosin heavy chain [1102] 


189036 c 


145/5 


.32 


145/5.23 


20/21 


(95%) 


17% 


New 


Putative P0 4 -nonmuscle myosin heavy chain [1101] rf 


189036 6c 


145/5. 


.29 


145/5.23 


14/16 


(87%) 


13% 


New 


Leukocyte elastase inhibitor [318] 


P30740 


42/5. 


.71 


42.7/5.9 


9/13 


(69%) 


22% 


2.4 


Grancalcin [1004] 


P28676 


24/5. 


.36 


24.0/5,02 


7/10 


(70%) 


31% 


New 


Down-regulated 


















Adenosylhomocysteinase [324] 


P23526 


48/5. 


,82 


47.7/6.04 


7/9 


(78%) 


14% 


0.4 


PEST phosphatase interacting protein homolog [234]* 


4100162' 


48/5. 


.30 


47.6/5.35 


11/13 


(85%) 


30% 


0.5 



a TrEMBL accession number. 

* Accession number and theoretical pi and M n for the unmodified protein are indicated. 
c iVCBI accession number. 

d See text for explanation. 

* Among three experiments, the ratio of P0 4 -stathmin expression increase, following LPS exposure in the presence of SB203580 divided by that 
in the absence of SB203580, was 0.93. 

f Genpept accession number. 

8 This search was performed using average masses measured by linear mode MALDI-TOF MS. 

Table V 

Analysis of pH 5.5-6*. 7 two-dimensional PAGE gels 
Results are from pooled samples for control (n - 3) and LPS-exposed (n - 3) PMNs from human donors. Expression of the reported proteins was 
altered > 1.5-fold following LPS exposure in two repeat experiments. 



Identification [spot no.l 


Swiss-Prot 
no. 


Estimated 
Mr/pI 


Theoretical 
AfR/pI 


Peptides matched/ 
submitted 


Protein 
covered 


Change 










% 


% 


-fold 


Up-regulated 












2.5 


Transaldolase [475] 


P37837 


38/5.95 


37.5/6.36 


13/17 (76%) 


33% 


Isocitrate dehydrogenase [431] 


075874 


46/6.25 


46.7/6.35 


7/7 (100%) 


13% 


2.3 


Moesin [201] 


P26038 


61/6.09 


67.8/6.07 


11/13 (85%) 


17% 


2.1 


a-Enolase [459] 


P06733 


43/5.64 


47.2/7.01 


7/10 (70%) 


17% 


3.8 


Down-regulated 














Calponin H2 [240] 


Q99439 


34/6.65 


33.7/6.94 


10/11 (90%) 


27% 


0.5 



regulatory factor 3, a known regulator of interferon-stimulated 
gene transcription, is not a direct target of p38 kinase. 2 There- 
fore, gene expression analysis of LPS-stimulated PMNs has 
uncovered a previously uncharacterized signal transduction 
system that is sensitive to inhibition of p38 MAPK. 

Knowledge of the genes down-regulated by LPS permits the 



development of further hypotheses addressing PMN function in 
the face of infection. Strikingly, several down-regulated genes 
and gene products are structural in nature {e.g. paxillin, acti- 
nin, calponin H2) (Tables II and V). A known consequence to 
the PMN of LPS exposure is decreased motility (48). Up-regu- 
lation of genes for adhesion molecules (ICAM-1, CD44, AL~ 
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Fig. 2. A, the predicted sequence of the tryptic phosphopeptide in 
PCVstathmin (1468.72 Da). The peptide mass measured by MALDI- 
TOF MS and the predicted mass differed by 14 ppm. As indicated, two 
alternate phosphorylation sites are possible: serine 16 and serine 25. B, 
P0 4 -stathmin and stathmin were identified on the control and LPS- 
exposed pH 5.0-6.0 gels. Consistent with phosphorylation, the PCV- 
stathmin spot was distinguished by a peptide of mass 1468.72 Da (i.e. 
80 Da greater than the peptide of 1388.72 Da seen in the stathmin spot). 
Assuming that no other multiply phosphorylated stathmin species have 
escaped detection, analysis of the integrated intensities of the PCV- 
stathmin and stathmin spots indicates that the percentage of the P0 4 
form of total cellular stathmin has increased from 11% to 38% with LPS 
stimulation. The decrease in integrated intensity for stathmin was 
equal in amount to the increase in P0 4 -stathmin following LPS 
exposure. 

CAM, and TSG-6), and down-regulation of genes for structural 
proteins, indicates a genetic basis for this observation. Down- 
regulation of two genes implicated in cytoskeletal regulation, 
Pix-a and RhoB, was also observed. The calcium-binding pro- 
tein S100A4, down-regulated in LPS-treated PMNs (Table II), 
has been implicated in cell motility and metastasis (49). De- 
creased motility may be beneficial in sustaining the inflamma- 
tory response at sites of infection. In addition, LPS treatment 
results in an inhibition of apoptosis (50). Therefore, the longer 
residence time of the PMN at sites of infection is consistent 
with the long term genetically coded changes seen in these 
gene-profiling experiments and indicates that the changes in 
gene expression are functionally relevant to host defense and 
immunity. 

By providing information on post-translational modification, 
the proteomics data may provide further insights into the cy- 



Table VI 

Effect ofSB203580 on LPS-stimulated gene expression 
Genes are reported for which the SB203580/control expression ratio 
is < 0.60. 



Gene name 


-fold change ratio 


Chant 


'e in absence 


of i 


3B203580 








■fold 


ISG15 


0.09 




22.5 


HCR 


0.38 




20.8 


Mx-1 


0 




19.4 


IFI56 


o 




12.1 


PI-9 


0.57 




9.5 


Ets-2 


0,59 




7.4 


IL-6 


0.45 




6.3 


Rel 


0.50 




6.2 


LIMS1 


0.58 




6.1 


C3AR1 


0.49 




6.1 


INDO 


0*35 




5.2 


KIAA0105 


0.41 




5.1 


SNAP23 


0.58 




5.0 


SLPI 


0.58 




4.7 


ELNAH2 


0.49 




4.6 


HM-74 


0.57 




3.8 


PKR 


0 




3.7 


MAD 


0.21 




3.6 


1F1T4 


0.12 




3.6 


Glycerol kinase 


0 




3.6 


1FI54 


0 




3.5 


IFI58 


0.39 




3.5 


1PF35 


0.46 




3.0 



Table VII 

Effect of SB203580 on LPS-stimulated protein expression 

n . . -fold change ratio Pu 1 ™^ !fr 

Protein name (SB203580&>ntrol) ? R S ™ 







-fold 


Up-regulated 






Proteasome /$ chain 


0.8 


1.51 


Annexin III 


0.6 


1.37 


Actin fragment [544] 


0.8 


1.74 


Actin fragment [591] 


0.8 


1.60 


a-Enolase 


0.6 


1.65 


Rab-GDP dissociation inhibitor j3 


1.1 


1.24 


Glutathione S-transferase P 


1.2 


1.54 


Pre-B-cell colony enhancing factor 


1.2 


1.29 


Do w n -regulated 






Adenylyl cyclase- associated protein 1 


1.3 


0.53 


Rho-GAPl 


0.8 


0.67 


Ficolin 1 


1.0 


0.74 



toskeletal remodeling effects of LPS upon the PMN. We con- 
tend that the actin fragments identified (Table III) are unlikely 
to represent technical artifacts. Rather, their specificity (iden- 
tical molecular weight/pi among different experiments), statis- 
tically significant up-regulation by LPS, as well as the use of a 
lysis buffer containing chaotropes and multiple protease inhib- 
itors argue instead that these fragments are physiologic con- 
sequences of LPS exposure in the human PMN. More specifi- 
cally, the up-regulation of these fragments following LPS 
exposure (Table III) suggests that LPS may activate an actin- 
cleaving enzyme, which, in turn, remodels the cytoskeleton. 
Intriguing in this vein, calpain has recently been reported to 
play an important role in cell migration and cytoskeletal orga- 
nization of fibroblasts (51). The possibilities that LPS may 
induce calpain activation and that calpain activation may reg- 
ulate cytoskeletal reorganization and motility are currently 
under investigation. An alternative possibility is that actin 
cleavage is a marker of neutrophil apoptosis (52). 

Other LPS-regulated proteins may play important roles in 
cytoskeletal reorganization. The up-regulation of protein-ty- 
rosine kinase 9-like (A6-related protein) may modulate LPS- 
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Table VIII 

LPS-regulated proteins for which a probe was present on the 
Asymetrix chip 

A comparison of corresponding protein and mRNA transcript change: 
following LPS exposure is shown. 



Protein 


Protein 


mRNA change 






-fold 


Up-regulated 




1.9 t 


Proteasome /3 chain 


1.5 


Leukocyte elastase inhibitor 




4.6 | 


Rab-GDI 0 


1 OA 




Grancalcin 


New 


Nt 


Transaldolase 


2.5 


riL> 


Moesin 


O 1 

Z.L 


Mr* 


Nonmuscle myosin heavy chain 


New 


NC 


Glutathione 5-transferase P 


1.54 


Absent 


Pre-B cell enhancing factor 


1.29 


Absent 


Isocitrate dehydrogenase 


2.3 


Absent 


P0 4 -stathmin 


2.1 


Absent (stathmin) 


Protein phosphatase 1, )3 catalytic subunit 


2 


Absent 


Annexin III 


3.1 


3.11 


Down-regulated 




2.1 1 


Adenylyl cyclase-associated protein 1 


1.9 


Rho-GAP 1 


1.5 


2.7 J 


Ficolin 1 


1.4 


1.7 | 


Adenosylhomocysteinase 


2.5 


Absent 


Calponin H2 


2 


NC 



° NC, no measureable change. 

induced actin polymerization, because it bears a high degree of 
homology to twinfilin (A6), an actin monomer-binding protein 
that localizes to sites of rapid filament assembly in cells and is 
believed to regulate actin filament turnover (53). In turn, LPS- 
induced down-regulation of Rho-GTPase activating protein 1 
(Table III) may regulate twinfilin (and protein-tyrosine kinase 
9-like) activity, because twinfilin has been shown to colocalize 
with Racl and Cdc42 and to be regulated by active Racl in NIH 
3T3 cells (53). Activation of Rho proteins may be facilitated by 
LPS up-regulation of moesin (Table V), because moesin report- 
edly induces the dissociation of Rho from GDI (54). Racl may, 
in turn, promote activation of the actin filament-nucleating 
Arp2/3 complex through interactions with WASP (Wiskott-Al- 
drich syndrome protein) family proteins (55) and, interestingly, 
is postulated to regulate the dynamics of both the actin and 
microtubule cytoskeletons via phosphorylation of stathmin (Ta- 
ble IV) (56). Calponin H2 is an actin-binding protein not pre- 
viously reported in PMNs that is postulated to play a role in 
cytoskeletal organization (57). Its down-regulation by LPS (Ta- 
ble V) likely modulates LPS-induced cytoskeletal reorganiza- 
tion. The up-regulation of nonmuscle myosin heavy chain and a 
putative phosphorylated form of myosin heavy chain (putative 
protein kinase C substrate by prediction rules) in the LPS- 
exposed PMN (Table IV) is of uncertain significance; myosin 
has been implicated in multiple functions in the PMN, includ- 
ing locomotion, fluid pinocytosis, and phagocytosis (58). Of 
interest, however, S100A4 (down-regulated, Table II) has been 
reported to regulate cytoskeletal dynamics by inhibiting pro- 
tein kinase C-mediated phosphorylation of nonmuscle myosin 
heavy chain (59). 

LPS induction of stathmin phosphorylation (Table IV and 
Fig. 2) may represent another mechanism by which the cy- 
toskeleton is remodeled. Stathmin is a phosphoprotein report- 
edly involved in both signal transduction and in regulation of 
the microtubulin filament network; furthermore, phosphoryla- 
tion of stathmin has been reported to modulate its tubulin- 
binding avidity (60). Inferences can be made about both the 
phosphorylation site on P0 4 -stathmin and the responsible ki- 
nase induced by LPS. Four phosphorylation sites in stathmin 
have been well described: Ser 16 , Ser 25 , Ser 38 , and Ser 63 (32, 33). 



Ser 16 has been reported as a substrate for Ca 2+ /calmodulin 
(CaM)-dependent kinases (32), and Ser 25 as primarily a sub- 
strate for p38 and ERK (33), with p34 cdc2 also active but bear- 
ing a 5-fold preference for Ser 38 (34). As stated above, the 
phosphopeptide identified in P0 4 -stathmin, extending from 
residues 15 to 27 (1468.7 Da), is consistent with phosphoryla- 
tion of either Ser 16 or Ser 25 (Fig. 2). Although both p38S and 
p38a MAPK isoforms are expressed in the human PMN, LPS 
has been shown to selectively activate the p38a isoform in 
human PMNs (9). The p38<* isoform, however, has been shown 
to be relatively inactive at Ser 25 ; in fact, p385 is -100-fold more 
active at Ser 25 , and selective p38a inhibitors do not inhibit the 
stress-activated phosphorylation of stathmin in 293 cells (33). 
Further support for the lack of involvement of p38 signaling in 
phosphorylation of stathmin in our system is the apparent lack 
of effect of SB203580 (a selective p38a and p38j3 inhibitor) on 
LPS-induced expression of P0 4 -stathmin (Table IV). Because 
p34 cdc2 i s relatively inactive at Ser 25 (34), we conclude that the 
phosphorylation site is likely to be Ser 16 , a reported substrate 
of CaM-dependent kinase. Although CaM kinases have previ- 
ously been implicated in gene activation in LPS-exposed my- 
elomonocytic HD11 cells (61), stathmin signaling has not, to 
our knowledge, been previously reported in either PMNs or 
lipopolysaccharide signal transduction. 

Cytoskeletal reorganization, a well-described regulator of 
granule release (62), may underlie LPS-induced priming for 
PMN granule release, but several LPS-regulated proteins may 
provide more specific clues. LPS exposure led to increased 
levels of grancalcin, a calcium-binding protein previously de- 
tected in PMNs and shown to translocate to granules and 
plasma membrane in the presence of physiologic concentra- 
tions of calcium (63). Similarly, annexin III, a calcium-binding 
protein highly expressed in PMN granule membranes and im- 
plicated in calcium-mediated secretion (64) and in granule fu- 
sion (65), was also found to be up-regulated. Exocytosis of 
granule contents may also be facilitated by LPS up-regulation 
of Rab-GDP dissociation inhibitor (Table III), which has been 
proposed to recycle Rab after vesicle fusion by extracting it 
from the membrane and loading it onto newly formed transport 
intermediates (66). 

Parallel use of DNA microarrays and proteomics affords a 
powerful strategy for comparison of corresponding mRNA tran- 
scripts and proteins, thereby affording new insight into the 
mechanisms by which the cell regulates its signaling responses 
to the external environment. Of interest, a poor correlation was 
found between corresponding transcripts and proteins (Table 
VIII), as reported in other systems (17, 18). The finding in some 
cases of unchanged transcript abundance in the face of regu- 
lated protein levels indicates post-transcriptional modulation 
following LPS exposure. The finding of undetected transcripts 
in the face of regulated levels of the corresponding proteins 
may indicate previous transcription of these genes in an earlier 
state of the myeloid maturation of the PMN, producing stable 
protein species that have undergone post-translational alter- 
ation following LPS exposure. The use of SB203580, a p38 
inhibitor, adds further insights into the mechanisms of LPS 
regulation. At the level of mRNA expression, SB203580 inhib- 
ited 23% of LPS-stimulated genes by >40% and 11% of genes 
by >60%; therefore, p38 plays a specific role in gene regulation 
in the PMN. In particular, proteasome 0 chain was up-regu- 
lated at both the mRNA transcript and protein level (Table 
VIII), with no notable effect of SB203580 on expression at 
either level, consistent with a non-p38-mediated pathway of 
primary transcriptional up-regulation induced by LPS. Simi- 
larly, CAP1, Rho-GAP 1, and ficolin 1 were down-regulated at 
both the mRNA transcript and protein level (Table VIII), with 
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no notable effect of SB203580, consistent with a non-p38-me- 
diated pathway of primary transcriptional down-regulation. 
Interestingly, annexin III was down-regulated at the transcript 
level and up-regulated at the protein level, with an inhibitory 
effect of SB203580 seen only at the protein level (Table VII), 
consistent with a p38-mediated post-transcriptional up-regula- 
tion induced by LPS. 

Limitations of the present study should be noted. Gene ex- 
pression analysis by cDNA microarrays does not distinguish 
between transcriptional regulation and mRNA stabilization; 
similarly, two-dimensional PAGE proteomics by itself does not 
distinguish among transcriptional, translational, or post-trans- 
lational regulation of protein abundance. Transcript detection 
by microarray technology is limited to the probes included; 
protein identification by two-dimensional PAGE proteomics is 
limited to well-resolved regions of the gel, may perform less 
well with hydrophobic and high molecular weight proteins, and 
tends to select for more abundant protein species (30). Harvest- 
ing of the LPS-incubated PMNs at 4 h may have prevented 
detection of earlier, transient changes and may have thereby 
introduced artifactual transcript-protein discordance. Further- 
more, the post-LPS incubation, pre-two-dimensional PAGE cell 
washes would be expected to remove secreted proteins from 
further analysis, with uncertain effects on detected protein 
abundance depending on such factors as the degree of de novo 
synthesis and extent of degranulation/exocytosis. Because pro- 
tein binding of Coomassie Blue has a limited dynamic range 
and is typically not linear throughout the range of detection, 
image analysis of Coomassie Blue-stained protein spots should 
be considered semi-quantitative. For some protein spots, the 
apparent magnitude of regulation by LPS may have been 
blunted by the spot approaching staining saturation in the 
control gel. By limiting our analysis to those protein spots 
common to all twelve pH 3.0-10.0 two-dimensional gels, we 
likely excluded some LPS-regulated proteins that happened to 
be either poorly resolved on a subset of the gels or unmatched 
by the image analysis software. By further limiting the analy- 
sis to those matched spots on the pH 3.0-10.0 gels for which a 
two- tailed t test demonstrated p < 0.05, the list of regulated 
proteins was likely also limited by statistical power. In addition 
to those regulated proteins listed in Table III, three others were 
up-regulated and three down-regulated with p < 0.09 (data not 
shown). 

Limiting our reported results to those changes that met 
statistical significance among the donors carries further impor- 
tant implications. We have encountered a two order of magni- 
tude range of response in unselected donor LPS-induced PMN 
functions, such as TNF-a and superoxide anion release (data 
not shown). The sources of this physiologic heterogeneity re- 
main uncertain but may possibly include such factors as nat- 
ural mutations of the LPS receptor component, TLR4 (67). By 
selecting for LPS effects common to all donors, we may not have 
characterized the range of genomic and proteomic heterogene- 
ity present in the population and thereby may have focused on 
only a narrow portion of a broader biological response to LPS. 
We contend that this reductionist approach is valid because it 
would be expected to enrich for biologically integral responses 
of the PMN to LPS. Nevertheless, correlation of genomic and 
proteomic profiles with functional phenotypes of the PMN may 
bear important diagnostic and therapeutic implications and 
will be pursued in future studies. 

Widespread regulation of numerous noncytokine/chemokine 
genes and proteins in the LPS-stimulated human PMN is a 
novel finding. These data indicate that, despite a narrow scope 
of gene expression in the nonstimulated state, the terminally 
differentiated, short-lived PMN likely plays a role in the innate 



immune response that is far more sophisticated and dynamic 
than the simple release of preformed inflammatory mediators. 
Although gene expression appears to be an important mecha- 
nism by which PMNs respond acutely to infection, mRNA tran- 
script/protein concordance is limited, and post-transcriptional 
(and post-translational) modifications also play an important 
role. The alteration of multiple transcriptional regulators, G- 
protein regulators, P0 4 -stathmin, and protein phosphatase 1 
indicates that one of the responses to LPS exposure is to modify 
subsequent signaling events by bacterial components or by 
other cytokines and chemokines. Finally, the finding that p38 
MAPK mediates LPS regulation of a limited subset of tran- 
scripts and proteins underlines the continuing need to define 
signal transduction cascades in the neutrophil. 
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sive catalog was prepared of the gene 
expression changes that occur during 
morphologic maturation. To do this, 3'- 
end differential display, oligonucleotide 
chip array hybridization, and 2-dimen- 
sional protein electrophoresis were used. 
A large number of genes whose mRNA 
levels are modulated during differentia- 
tion of MPRO cells were identified. The 
results suggest the involvement of sev- 
eral transcription regulatory factors not 



previously implicated in this process, but 
they also emphasize the importance of 
events other than the production of new 
transcription factors. Furthermore, gene 
expression patterns were compared at 
the level of mRNA and protein, and the 
correlation between 2 parameters was 
studied. (Blood. 2001;98:513-524) 
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Although the mature neutrophil is one of 
the better characterized mammalian cell 
types, the mechanisms of myeloid differ- 
entiation are incompletely understood at 
the molecular level. A mouse promyelo- 
cytic cell line (MPRO), derived from mu- 
rine bone marrow cells and arrested devel- 
opmentally by a dominant-negative 
retinoic acid receptor, morphologically 
differentiates to mature neutrophils in the 
presence of 1 0 retinoic acid. An exten- 

Introduction 

Studies of normal myeloid maturation from many laboratories have 
identified genes that may play critical roles in myeloid differentia- 
tion. 1 - 4 Current studies suggest that these events are dependent on a 
cascade of molecular changes that involve complex modulation of 
mRNA transcription. Furthermore, studies of acute leukemia have 
suggested that the disease arises from the accumulation of myeloid 
precursors arrested at early stages of differentiation and associated, 
in many cases, with chromosomal rearrangements that alter the 
structure of specific transcription factors. 5 Nevertheless, the molecu- 
lar events underlying the production of mature myeloid cells are 
not well understood and appear to use interacting pathways and 
networks, the elucidation of which requires an extensive descrip- 
tion of the molecular components available to the myeloid cell. 

An extensive body of information is accumulating with respect 
to gene expression profiles of mammalian cells. However, much of 
the information available in public databases has been accumulated 
by the use of techniques such as single oligonucleotide chips or 
cDNA arrays that measure fewer than 6000 of potentially 30 000 to 
1 20 000 transcripts. The more limited range of analyses reported by 
the serial analysis of gene expression (SAGE) 6,7 technique accu- 
rately estimates changes in levels of the more abundant mRNAs but 
requires extensive redundant analyses to measure changes in the 
patterns of expression of scarce mRNAs, We have used a modified 
polymerase chain reaction (PCR)-based cDNA differential display 
(DD) method in which single restriction fragments derived from 
the 3' end of cDNAs are separated on a sequencing gel. 8 9 Bands 
from the gel can be identified initially by sequencing, but then 



comparison of patterns from different samples can be made without 
further sequencing. This sensitive and reproducible method detects, 
in principle, most cDNAs regardless of whether they are repre- 
sented in existing databases. 

Systematic analysis of the function of genes can also be 
performed at the protein level This approach has the advantage of 
being closest to function, because proteins perform most of the 
reactions necessary for the cell. The most common method of 
proteome analysis is the combination of 2-dimensional gel electro- 
phoresis (2DE) to separate and visualize protein and mass spectrom- 
etry (MS) for protein identification. 10 Several such analyses of 
yeast and of normal or malignant mammalian cells have been 
performed. To date, however, there have been few studies in which 
both mRNA and protein have been compared by applying analyses 
to the same samples. The studies of Anderson 11 and Gygi 12 showed 
that there is not a good correlation between mRNA and protein 
levels, in yeast or human liver cells. However, other analyses 
disagree with this conclusion (Greenbaum et al, manuscript 
submitted, and Futcher et al 14 ). Furthermore, global correlations 
between changes in mRNA and protein levels have not been 
examined during the execution of any developmental program. 

The MPRO cell line was derived by transduction of a dominant- 
negative retinoic acid receptor construct into normal mouse bone 
marrow cells. It is a granulocyte-macrophage colony-stimulating 
factor (GM-CSF)~<lependent line arrested at a promyelocytic stage 
of development. 15 ' 16 After treatment with oXUrans retinoic acid 
(ATRA) most of the cells acquire the morphology of .mature 
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neutrophils and begin to produce neutrophil lactoferrin and gelati- 
nase, 2 proteins characteristic of neutrophil secondary granules. 17 
As such, it offers a valuable model for studying neutrophil 
differentiation in vitro. 

We now report the analysis of mRNA expression changes 
during the process of MPRO cell maturation to neutrophils and 
compare the results with a limited analysis of cellular protein 
composition. mRNA expression changes were studied by combin- 
ing the use of oligonucleotide arrays and DD. A database (dbMC) 
with comprehensive genomic information for myeloid differentia- 
tion program was constructed (accessible at http://www.bioinfo.mbb. 
yale.edu/expression/neutrophil). We have grouped the changes in 
mRNA levels of a large number of genes into 6 patterns, with 
implications for the genetic program of myeloid differentiation. 

We also compared 2-dimensional high-resolution gel electro- 
phoretograms from control cells and cells differentiated for 72 
hours in the presence of ATRA. Fifty protein spots whose relative 
intensity changed prominently during differentiation were exam- 
ined by mass spectrometry. The results suggest a poor correlation 
between mRNA expression and protein abundance, indicating 
that it may be difficult to extrapolate directly from individual 
mRNA changes to corresponding ones in protein levels (as 
estimated from 2DE). 



Materials and methods 

Cell lines 

MPRO cells and HM-5 cells provided by Dr Schickwann Tsai (Fred 
Hutchinson Cancer Research Center, Seattle, WA) 15 were used throughout 
the study. The cells proliferated continuously as a GM-CSF-dependent cell 
line at 37°C in Iscoves modified Dulbecco medium (Gibco BRL, Grand 
Island, NY) supplemented with 5% to 10% fetal calf serum (Gibco BRL) 
and 1 0% HM-5-conditioned medium as a source of GM-CSF. Morphologic 
differentiation of the blocked MPRO promyelocytes was induced by 
treatment with 10 u.M ATRA (Sigma, St Louis, MO). Controls were 
cultured in the absence of ATRA but with the same volume of ve- 
hicle (ethanol). 

RNA isolation and differential display 

After exposure to 10 \iM ATRA for 0, 24, 48, or 72 hours, total cellular 
RNA was isolated from MPRO cells using TRIzol reagent (Life Technolo- 
gies, Gaithersburg, MD). cDNA was then synthesized using a T-7 Sal-Oligo 
d(T) 32 primer as described previously. 818 The double- stranded cDNA was 
digested with 1 of 9 different restriction enzymes (Apal, BglN, Bam HI, 
Eagl, EcoRl, HindM, Xba\, Kpn\, and Sphl) and ligated to Y-shaped 
adaptors with a complementary overhang. DNA fragments were then 
amplified by PCR as described previously. 8,1 8 PCR products were separated 
on a sequencing gel of 6% polyacrylamide with 7 M urea. The gel was dried 
and exposed to x-ray film. Genes from differential display gels, whose 
maximum intensity changes equaled 2+ on a scale of 1+ to 8+, were 
recorded as significantly changed. 19 Individual DNA bands were recovered 
from the gels, amplified by PCR, and sequenced. 

Oligonucleotide chip analysis of RNA samples 

Ten micrograms total RNA from each sample (0, 24, 48, or 72 hours) was 
used to prepare cDNA. This cDNA was transcribed with T7 RNA 
polymerase to prepare a fluorescently labeled probe. 20,21 Each sample was 
hybridized to mouse array chip (MullfC Array; Affymetrix, Santa Clara, 
CA) containing oligonucleotide probe sets corresponding to approximately 
7000 known genes or ESTs represented by UniGene clusters. 22 cDNAs 
were considered present if their probe set results were rated as such by the 
GeneChip software (Affymetrix) and if the average difference (AD) 
between perfect match and mismatch probe pairs was not less 100 U. If a 



gene was represented by more than one array probe set, the average of all 
probe sets for the gene was taken. Genes with AD values between 100 and 
200 were considered unchanged because of their low expression levels. 
Those genes with AD values equal to or more than 200 U at one time point 
were further studied by rescaling, threshold, and normalization methods 
described in the MIT Center for Genome Research Web site. 13 A value of 20 
was assigned to any gene with an AD below 20 at some time point. 

Bioinformatics and database development 

All the sequences or gene fragments were searched using Blast against 
GenBank and TIGR gene indices. A database of genes or ESTs whose 
expression levels changed during myeloid differentiation was constructed 
containing information for each band or gene. This included GenBank 
matches, Locus Link or Unigene clusters, expression patterns, tissue 
distribution, synonym(s) protein name, gene name(s), notations of possible 
functions, poly A signal and sequence quality, and hyperlinks to the 
database searches, sequence trace files, and related references. All gene data 
were then gathered into a cluster file. Supplementary information is 
available at http://bioinfo.mbb.yale.edu/expression/neutrophil. 

Classification and analysis of DNA fragments 

Sequences from differential display analyses were classified as representing 
known genes, ESTs, genomic sequences, or novel genes as described. 19 - 23 
Known genes from both differential display and arrays were clustered into 
27 functional categories and searched against SWISS-PROT (http:// 
www.expasy.cbr.nrc.ca/cgi-bin/sprot-search-ful) or PIR (http://www.pir. 
georgetown.edu/). Information such as function, subcellular location, 
family and superfamily classification, map position, similarity, synonym(s) 
protein name, gene name(s), and so on was recorded in a variety 
of databases. 

Northern blot analysis 

Thirty micrograms total cellular RNA per lane from time-course MPRO 
cells were loaded onto 1 .2% formaldehyde-agarose gels, then transferred to 
Hybond-N+ membranes (Amersham Pharmacia Biotech, Uppsala, Swe- 
den). After standard prehybridization, membranes were hybridized over- 
night at 65°C with radiolabeled cDNA probes (ordered from Research 
Genetics according to their dbEST Image ID). Membranes were washed at a 
final stringency of 60°C in 0.1 X SSC. 

Immobilized pH gradient 2-dimensional gel electrophoresis 
and mass spectrometry 

Induced MPRO cells collected at 0 and 72 hours were lysed with lysis 
buffer (540 mg urea, 20mgdithiothreitol,20 uX Pharmalyte [3-10], 1.4 mg 
phenylmethylsulfonyl fluoride, 1 u.g each aprotinin, leupeptin, pepstatin A, 
and antipain 50 p-g TLCK, and 100 u,g TPCK/1 mL). We applied 100 ^L 
each MPRO cell lysate (2.5 X 10 6 cells/l00 uX) to immobilized pH 
gradient (IPG) strips (pH 3-10 L; Amersham Pharmacia Biotech), and IPG 
electrophoresis was conducted for 16 hours (20 100 Vh) using an Immobi- 
line Drystrip Kit (Amersham Pharmacia Biotech). Electrophoresis in the 
second dimension was carried out in a 12% sodium dodecyl sulfate- 
polyacrylamide gel electrophoresis (SDS-PAGE) gel with the Laemmli- 
SDS continuous system in a Protean II xi 2-D cell (Bio-Rad) run at 40 mA 
constant current for 4.5 hours. Proteins were detected by Brilliant Blue 
G-colloidal staining. 24 Protein spots were excised from the gel and digested 
with trypsin. ACTH clip (average [M+H] 2466.70) and bradykinin 
(average [M + H] 1061 .23) were used for calibration of peptide masses. One 
microliter sample digest was mixed with 1.0 uX a-cyano-4-hydroxy 
cinnamic acid (4.5 mg/mL in 50% CH 3 CN, 0.05% TFA) matrix solution and 
1 uX calibrants (100 fmol) each. The spectra of the peptides were acquired 
in reflector/delayed extraction mode on a Voyager-DE STR mass spectrom- 
eter (Perseptive Biosystems, Foster City, CA). Peptides were identified 
using the ProFound search engine. 39 
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Results 

Differentiation of MPRO cells 

Figure 1 illustrates the morphologic changes in an MPRO cell 
population representative of those used for RNA expression 
analysis. Undifferentiated MPRO cells resembled promyelocytes 
under the light microscope (Figure 1 A). After induction with ATRA 
for 24 hours, the cells morphologically differentiated into metamy- 
elocytes (Figure IB), At 48 hours, the cells further developed 
into metamyelocytes and band neutrophils (Figure 1C). At 72 
hours, nearly 100% of MPRO cells became mature neutrophils 
(Figure ID). 

Identification of mRNAs by differential display assay 

MPRO cellular mRNA was analyzed at 0, 24, 48, and 72 hours after 
ATRA treatment. Nine restriction enzymes were used in a 3 '-end 
DD approach. During MPRO differentiation, 1109 fragments 
corresponding to 837 transcripts were found to change substan- 
tially in expression levels (Figure 2). These represented approxi- 
mately 279 known genes, 112 ESTs, and 59 putative new genes, 
each with a perfect or fair polyadenylation signal at an appropriate 
distance from the oligo-dT priming site. The gene information 
detected by DD was collected in database dbMCd. 

Identification of mRNAs by oligonucleotide chip assay 

We used an oligonucleotide chip containing 13 179 probe sets 
corresponding to approximately 7000 murine genes to analyze 
patterns of mRNA expression in the same RNA samples used for 
DD. The information obtained by oligonucleotide arrays was 
collected in the database dbMCa. 

We clustered the genes by their similarity to idealized 
expression patterns. For instance, the expression pattern of an 
ideal gene that is overexpressed (high) at time 0 and underex- 
pressed (low) at 24, 48, and 72 hours, would be high-low-low- 
low (HLLL). Overall we have (2 4 -2) idealized patterns exclud- 
ing HHHH and LLLL. Pearson correlation was used as the 




Figure 1. Morphology of MPRO cells during differentiation. MPRO cells were 
induced as described in "Materials and methods," concentrated by cytospin, and 
Wright-Giemsa stained. (A) Uninduced MPRO cells. (B) MPRO cells induced with 
ATRA for 24 hours. (C) MPRO cells induced with ATRA for 48 hours. (D) MPRO cells 
induced with ATRA for 72 hours. 




Figure 2. Distribution of genes obtained by DD assay. MPRO cell mRNA was 
analyzed at 0, 24, 48, and 72 hours after ATRA treatment; 1109 fragments 
corresponding to 837 transcripts were found to change substantially in expression 
levels. The total 837 transcripts were classified into 6 categories according to the 
bioinformatic analysis. Percentages show the gene distributions in these 6 catego- 
ries. Information for each transcript was collected in database dbMCd. 

measure of similarity of each gene expression pattern, 
jc = (*i,;c2»*3 5 *4) to each of the 14 idealized patterns 
y — (y i j^J^JM)- The 4 entries of x and y corresponded to the 
4-dimensional gene expression levels at 0, 24, 48, and 72 hours, 
respectively. Each gene was assigned to a cluster labeled by the 
idealized pattern that had the maximal correlation with that 
gene. We selected only genes that hybridized well compared 
with the background (considered "present" by GeneChip soft- 
ware) and had maximal AD amplitude greater than 200 U in at 
least 1 of the 4 stages. We further tabulated the 14 patterns 
according to whether the gene expression changed at early 
(0-hour), intermediate (24- and 48-hour), and late (72-hour) 
time points and whether gene expression monotonically in- 
creased (up-regulated), monotonically decreased (down-regu- 
lated), or was not monotonic (transient). Table 1 shows 8 
clusters of 104 genes that had significant changes of mRNA 
levels, arranged according to the temporal stage and the 
monotonic/transient changes of expression levels. 

Principal component analysis determined whether we could 
comprehensively present multidimensional data (4-dimensional in 
our case) in a simple 2-dimensional graph. First, we found the 4 
principal components, which were the axes of the most compact 
4-dimensional ellipsoid that encompassed the 4-dimensional cloud 
of data. Each axis was a different linear combination of the original 
4 variables. Then we verified that the first 2 principal components 
(the first 2 largest axes of the ellipsoid) captured most (95.2%) of 
the variation of the data. Therefore, the data could be faithfully 
projected (with a minor loss of information) into a 2-dimensional 
graph, with the 2 largest principal components as the x- and y-axes. 
As shown in Figure 3, genes tend to coalesce in clusters, according 
to their labels determined by their similarity to an ideal expression 
pattern. In summary, a genomic (global) picture of the distribution 
of genes according to their similarity to predetermined idealized 
multidimensional expression patterns is concisely displayed in a 
2-dimensional graph. 
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Table 1. Genes differently regulated during the different stages of mouse pro myelocytic cell line differentiation process 








Timing 




uo it?y ui y 


Early 


Middle 


Late 


Up-regulation 
Down-regulation 


LHHH (n = 10) 

MadP2rx1 Itgb2 H1r2 Lcn2 ItprS 

Cebpb H2-D Etohi6 Zyx 
HLLL (n = 11) 

Tcrg-V4 Ly64 Ctsg Spi2-1 Mcpt8 
Myc Myb Tlr4 Npm1 Erh Hsp60 
• 


LLHH (n = 6) 

Piral Cybb Pfc PiraS Cd53 Ifngr2 

HHLL(n = 1) 
Mpo 


LLLH (n = 13) 

in a CsflrU Crs/SIOOao l-ulk uiss 

Aldol Rac2 Fpr1 Ctsd Ubb Ptmb4 
HHKL (n = 37) 

Actx lrf2 EL2 Rpl19 Actb Ly6e Atf1 Hist2 
Psma2 Gnas Zfp36 Il4ra LtorShfdgl 
Max Rps8 Csf2rbl Slpi Tctexl Tpi Btf3 
Cntf Gys3 Sic 10a1 CtsbSeppI Rtn3 
Ccnb2 S100a9 Cf1 1 Hist5-2ax Rela 
Copa Gstml Gnb2-rs1 Grn RPL8 


Transient 




LLHL (n = 9) 

Sell M/2Pira6 PirbLstl Ltf Sema4d State Mmp9 
LHHL(n = 17) 

Cebpa Lyzs Fcgr3 Arf5 Lampl Stat3 Csf2ra Osi 
Actg Sfpil Gpx3 Ptprc Prtn3 Irf1 Rps6ka1 
Ltb4r Myln 





Arrays of Affymetrix Mu11k containing 13103 probe sets corresponding to 12002 GenBank accessions were used for hybridization. Arrays were hybridized with 
streptavidin-phycoerythrin (Molecular Probes) biotin-labeled RNAand scanned. Intensity for each feature of the array was captured using Genechip software (Affymetrix), and 
a single raw expression level for each gene was derived from the 20 probe pairs representing each gene using a trimmed mean algorithm. For each gene, an AD of 24-, 48-, and 
72-hour samples was calibrated by dividing the slope of the linear regression line for a graph with the x-axis the AD of 0-hour probe sets and the y-axis the AD of the respective 
time point (24, 48, or 72 hours). A threshold of 20 U was assigned to any gene with a calculated expression level below 20 because discrimination of expression below this level 
could not be performed with confidence. 38 Each gene expression profile was categorized as described in Tables 3, 4, and 5. For the 4 time points, the minimum AD of the 
relatively higher group (MIN-H) was divided by the maximum AD of the relatively low group (MAX-L), and those genes whose MIN-H/MAX-L greater than 2 were selected as 
meaningfully regulated. Genes were sorted in descending order based on the MIN-H/MAX-L. Genes in boldface are those whose expression level was in the top 20% (ie, 
maximum AD of 4 time points greater than 3000), and genes in italics are those in the bottom 20% (ie, maximum AD of 4 time points less than 300). The differentiation period 
was grouped into 3 stages: early (0-hour), middle (24-hour and 48-hour), and late (72-hour) stages. 

AD indicates average difference; gene symbols are expanded in an Appendix at the end of this article. 
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Figure 3. Gene clusters in the first 2 principal component spaces. Principal 
component analysis allowed us to present the multidimensional data (in this case, 
4-dimensional data of each gene expression pattern) in a simple 2-dimensional 
graph. We derived the 4 principal components, which are a linear combination of the 
standardized expression intensities (zero mean and unit variance) at 0, 24, 48, and 
72 hours: The first 2 principal components captured most of the variation of the data 
(approximately 85%). Therefore, the data can be displayed (with a minor loss of 
information) in a 2-dimensional graph. The first and second principal components, c1 and 
c2, are given by the linear combinations Ci = 0.747 * n1 - 0.11 * r>2 - 0.656 • n3 + 0 • 
n4 and Ci - 0.278 - n1 + 0.353 * n2 + 0.233 • n3 - 0.863 • n4, where n1, n2, n3, 
and n4 are the rescaled and standardized expression levels at 0, 24, 48, and 72 
hours, respectively. The axes legends d and c2 stand for the first 2 principal 
components. In this paper we used the Pearson correlation to measure the similarity 
of each gene with the idealized expression patterns, as opposed to the Euclidean 
distance we used in a previous work, 19 because clusters were better separated using 
this measure. In both cases, we presented the data in the 2-dimensional space of the 
lowest principal components. The data had a tendency to be circularly distributed 
when we used the Pearson correlation as a distance measure. 



Correlation between array and DD analyses 

We have previously demonstrated a correlation coefficient of 0.93 
between visual estimates of changes in band intensity on DD and 
Phosphorimager System (Molecular Dynamics, Sunnyvale, CA) 
estimates of band intensity and a correlation coefficient of 0.88 
between hybridization intensity changes of mRNA on Northern 
blot analyses and changes in band intensity on DD. 19 In a few cases 
there were clear discrepancies in the pattern of expression of a 
gene, as estimated by DD and by oligonucleotide chip analysis. We 
chose the 6 most extreme cases and examined the levels of mRNA 
change for these genes by Northern blot analysis (Figure 4). In 5 
cases, the Northern blot results agreed with the results of the DD 
analysis, whereas the results of Gnb2-rsl disagreed with the 
oligonucleotide array but duplicate bands from DD showed a 
relatively high level of expression in the 0 time sample that did not 
correlate with the Northern blot (Table 2). One possible explana- 
tion for these findings was the change in the relative use of different 
polyadenylation sites after the addition of ATRA to the MPRO cells. 

Constructing a database for mRNA level changes during 
myeloid differentiation 

Based on the data obtained above, an in- house database (dbMC) 
was constructed that included 2 subdatabases, dbMCd and dbMCa, 
for collecting gene information from DD or oligonucleotide arrays, 
respectively. Each entry in dbMC is accompanied by a so-called 
executive summary. The linkage between dbMCd and dbMCa was 
established by UniGene ID and cluster ID. dbMC contains the 
temporal expression patterns of genes during the MPRO cell 
differentiation process, including not only products represented in 
public databases but also novel transcripts. 
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Figure 4. Northern btot analysis of selected mRNAs. Equivalent amounts of RNA 
from MPRO cells induced byATRAat different time points (0 hour, 24 hours, 48 hours, 
and 72 hours) were resolved by formaldehyde-agarose gel electrophoresis, stained 
to verify the amount of loading. Eleven genes were separately probed on the RNA 
filters. The gene symbol of each probe was listed at the left of a related Northern blot 
result. Detailed information on these 11 probes was listed in Table 5. One of the 
RNA-blotted membrane photographs is shown with methylene blue-stained 28S and 
18S RNA subunits demonstrating the quality and quantity of RNA loaded in 
individual lanes. 

Analysis of gene expression patterns during MPRO 
differentiation 

Many of the genes identified in this study were found in myeloid 
cells or were implicated in myeloid development for the first time. 
We detected 8 cytokines 25 and chemokines whose mRNA levels 
changed more than 5 -fold by arrays and 2- fold by DD during the 
maturation of MPRO cells (see our Web site, http://bioinfo.mbb. 
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yale.edu/expression/neutrophil). Among these were 2 members of 
the CC chemokine family, Interleukin- 1 a (IL-la) was up-regulated 
at the late stage of differentiation (LLLH pattern, Table 1). 

mRNA for approximately 52 receptors was detected by one or 
the other method. A number of the receptors known to be present on 
mature neutrophils showed late induction of mRNA, and their 
levels of induction were high, indicating that the expression of 
these products is a prominent event late in neutrophil maturation 
(Table 3). Rarely was mRNA for receptors down-regulated, 
consistent with myeloid maturation being accompanied by increas- 
ing responsiveness of the cell to a variety of external stimuli. 

Expression of mRNA for granule proteins 

Neutrophils contain several types of granules that develop at 
different stages of myeloid maturation. 31 7,26 Levels of mRNAs 
encoding secondary granule proteins, such as lactoferrin, increased 
as the cells matured (Table 4). The level of mRNA for Mmp9, 
reported as a tertiary granule protein, increased markedly between 
24 and 48 hours after the induction of differentiation, whereas 
mRNAs for secondary granule proteins either increased less 
markedly or showed a maximum increase by 24 hours, mRNAs for 
several primary granule constituents, such as myeloperoxidase and 
cathepsin G, were present in unstimulated cells and decreased as 
the cells matured. There was a discrepancy in the measurements of 
proteoglycan mRNA by DD and oligonucleotide chips, but North- 
ern blots showed that it reached a peak at 48 hours and then 
declined (Figure 4). Cathepsin D is reported as a primary granule 
protein, but its pattern of mRNA expression more closely re- 
sembled that of secondary granule constituents. In addition to 
known granule components, mRNAs for several other cathepsins 
were up-regulated during myeloid differentiation, in parallel with 
or later than the tertiary granule protein mRNAs. 

mRNAs for transcription factors 

Transcription factor genes, including several identified at the sites 
of consistent chromosome rearrangements in acute myeloid leuke- 
mia, have been implicated in normal myeloid differentiation and in 
the expression of neutrophil proteins, 2 - 5 * 27 However comprehensive 
information concerning the expression of these transcription fac- 
tors during myeloid development is not readily available. There- 
fore, we compared gene names and identifiers in our databases to 
those of the transcription factor database Transfac (http:// 



Table 2. Expression patterns of genes detected by Northern blot analysis 



Gene 
symbol 


Gene 
accession 




AD value by array 






Intensity by DD 




Oh 


24 h 


48 h 


72 h 


Oh 


24 h 


48h 


72 h 


Cebpa 


M62362 


33 


212 


182 


44 










Cebpb 


X62600 


390 


1248 


1380 


1903 










Cebpd 


X61800 


157 


262 


168 


430 










Cebpe 




















Myb 


M 12848 


892 


356 


230 


435 










Slpi 


U73004 


617 


501 


783 


402 


1 


2 


3 


3 


Prg3 


W45834 


153 


259 


339 


345 


5 


1 


1 


2 


Gnb2-rs1 


X75313 


4231 


3623 


3215 


3403 


4 


4 


1 


1 


Ly6e 


U04268 


3061 


5391 


2844 


1282 


3 


2 


1 


1 


Lspl 


M90316 


65 


376 


840 


28 


2 


3 


5 


6 


Actb 


X03765 


3095 


3588 


3976 


2434 


1 


2 


3 


2 



Gene symbol and gene accession refer to National Center for Biotechnology Information databases and, in particular, to Locus Link. AD value is the average difference in 
the value of hybridization intensity between the set of perfectly matched oligonucleotides and the set of mismatched oligonucleotide in the oligonucleotide array. Band 
intensities from DD were semiquantified on a scale from 1 (+) to 8 (+ + + + + + + +). These estimates are shown as boldface numbers in this table. 19 Both AD value and 
intensity of genes were studied at 4 time points corresponding to MPRO cells induced for the indicated times. 

DD indicates differential display; MPRO, mouse promyelocyte cell line; for gene symbols, see the Appendix at the end of this article. 
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Table 3. Receptors expressed during myeloid differentiation process 



Maximal fold change 


Gene symbol 


Gene accession 


Oh 


AD value by array 
24 h 48 h 


72 h 


Less than 2 
















Bzrp 


\j£.\£.\ji 


641 


658 


881 


887 




Cmkar4 


vnnc q 4 


508 


447 


378 


684 




Crry 


M34173 




384 


506 


506 




Csf2rb1 


tin JO tV7 

M34397 


o io 


345 


410 


241 




HtrSa 


Z18278 


188 


272 


273 


339 




M6pr 


X64UDO 


536 


409 


408 


649 




MPPIR 


C70Q 

AAnb/os 


232 


84 


63 


381 




TCRGB 


M26053 


TOO 


212 


244 


299 




Tnfrsfla 


M59377 


0 


1 


1 


4) 


2 or more, less than 3 
















Cmkbrl 


Uzo4U4 


221 


244 


504 


638 




Crhr 


A / £.0\JO 


121 


200 


250 


355 




Csf2ra 


»jpcn7fl 
(WoOU/O 


171 


372 


402 


254 




Ebi3 


AF013 1 14 


10/ 


270 


428 


148 




Gridl 


D10171 




164 


150 


257 




Ifngr 


J 05265 


141 


263 


327 


251 




Il2rg 


U21795 


205 


184 


231 


477 




Ldlr 


X64414 


1399 


1653 


1665 


3968 




P40-8 


IMQ7rt 

JUzofU 


849 


677 


381 


640 




Plaur 


X62701 


312 


443 


476 


734 




Rarg 


M34476 


102 


113 


114* 


218 




Srb1 


U37799 


126 


232 


132 


258 


3 or more, less than 4 
















Cr2 


a A orn Q 4 


83 


138 


243 


77 




Csf2rb2 




209 


249 


437 


111 




Fcerlg 


JUDUZU 


2398 


2766 


3365 


8751 




Fcgr2b 


X04o4o 


1703 


1652 


1431 


4605 




Ifngr2 


1 IRQ£OQ 

uoyoyy 




2 


2 


3 


4 or more, less than 5 
















Nr4a1 


XI 6995 


96 


188 


202 


401 


5 or more 
















I11r2 


X59769 


482 


1796 


2872 


3818 




C5r1 


L05630 


185 


434 


808 


1078 




Drd2 


X55674 


0 


0 


0 


219 




Fcgr3 


M14215 


1 


1 


1 


2 




Fpr1 


L22181 


0 


89 


141 


671 




GCR 


AA240711 


2 


0 


0 


0 




L-CCR 


AA034646 


48 


175 


314 


2056 




NMDARGB 


AAB20211 


2 


2 


0 


0 




P2rx1 


X84896 


79 


346 


530 


744 




Piral 


U96682 


0 


43 


172 


378 




PiraS 


U96686 


274 


391 


954 


1874 




Pira6 


U96687 


122 


635 


2014 


1716 




Pirb 


U96689 


191 


445 


* 966 


747 




Sell 


M25324 


46 


104 


570 


20 




Tcrg-V4 


M54996 


1650 


78 


65 


315 



Receptors are identified as present whose maximal AD values were more than or equal to 200 U in this study. Genes were sorted by their expression patterns as follows: 
first by the average difference value, then by the difference between minimum and maximum AD for the 4 time points, and last by the alphabetical order of gene symbols. Genes 
were ordered according to the maximal fold change of AD values. Abbreviations of gene names are taken from gene symbols listed in the Locus Link portion of the National 
Center for Biotechnology Information database where available. Numbers in bold denote those gene expression patterns obtained by differential display rather than by 
oligonucleotide array assays. The other information is presented as in the legend to Table 2. 

AD indicates average difference; gene symbols are expanded in an Appendix at the end of this article. 



www.transfac.gbf-braunschweig.de/TRANSFAC) and determined 
which factors contained in this database were present at detectable 
levels in MPRO cell mRNA, using Affymetrix software for the 
criteria for inclusion of mRNAs from approximately 200 murine 
transcription factors probe sets on the oligonucleotide chip. Of 
these, 54 were expressed and 13 showed changes of 3-fold or more 
in chip signal (Table 5). 

The changes in certain transcription factors, such as the moderate 
down-regulation of tnyb and myc and the up-regulation of the Max 
dimerization protein MAD, were consistent with the shift of the cells 



from a proliferative to a differentiated state. 28 Some changes are more 
difficult to explain, such as the up-regulation of DPI, a partner for E2f 
factors in the regulation of S-phase genes, and the mild up-regulation of 
the Id genes, commonly associated with an inhibition of differentiation 
by competition with bHLH transcriptional activators. 29 

The C/EBP family has been extensively studied with respect to 
myeloid differentiation. 2 - 30 Absolute levels of the C/EBP a and 5 
mRNAs were low, probably at the borderline of significance for the 
oligonucleotide chip assay, whereas the level of C/EBP p appeared 
higher, fn addition, there were discrepancies between the chip 
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Table 4. Granule constituents expressed during mouse promyelocyte cell line cell differentiation 



AD value by array 



Granule constituent 



Gene symbol 



Gene accession 



Oh 



24 h 



48 h 



Azurophil (primary) granules 
















Man2c1 


AA161860 


178 


134 


99 


164 




Ctsb 


M65270 


442 


480 


595 


389 




Ctsd 


X52886 


214 


1087 


1828 


2784 




Ctsg 


M96801 


1509 


405 


46 


286 




E12 


U04962 


658 


1273 


843 


157 




E1a2 


AA689016 


47 


159 


134 


163 




Gus-s 


M63836 


544 


226 


266 


254 




Lyzs 


M21050 


0 


1 


1 


3 




Mcpt8 


X78545 


831 


268 


66 


491 




Mpo 


X15378 


3788 


3009 


776 


692 




Prg 


X16133 


2621 


2653 


2920 


9859 


Possible granule proteins 
















Ctsc 


AA1 44887 


252 


194 


342 


576 




Ctse 


X97399 


1 


3 


4 


5 




Ctsh 


U06119 


45 


124 


195 


156 




Ctsl 


X06086 


16 


11 


31 


237 




Ctss 


AA089333 


12 


9 


88 


463 


Specific secondary granules 
















Cpa3 


J05118 


621 


270 


90 


801 




Cd36l2 


AB008553 


113 


93 


157 


187 




Cnlp 


X94353 


80 


479 


704 


626 




Cybb 


U43384 


8 


24 


91 


128 




Ear2 




0 


1 


1 


2 




Fpr1 


L22181 


178 


220 


235 


846 




ltgb2 


X14951 


0 


2 


4 


2 




Lcn2 


W13166 


916 


3513 


3931 


6036 




Ltf 


J03298 


19 


162 


333 


138 




MBP 


W45634 


5 


1 


1 


2 




Mmp13 


X66473 


44 


43 


72 


178 




Ngp 


L37297 


2661 


4782 


2311 


6912 


Tertiary granules 
















Mmp9 


Z27231 


0 


1 


2 


2 



Shown are the possible granule protein cDNAs represented on the oiigionucleotide arrays, sorted by their expression patterns as follows: first by the average difference AD 
value, then by the granule types, and last by the alphabetical order of gene symbols. Data are presented as described in the legend to Table 3. 
AD indicates average difference; gene symbols are expanded in an Appendix at the end of this article. 



estimates and the mRNA levels observed by Northern blotting with 
specific probes for these genes. In particular, the latter method, 
more sensitive and specific, showed that C/EBP a began to decline in 
the most mature cells, whereas C/EBP 8 mRNA declined progressively 
beginning at 24 hours after the onset of differentiation. 

C/EBP e is a more recently cloned C/EBP family member. Previous 
studies indicated it is expressed in a large array of human leukemia cell 
lines blocked at various stages of differentiation and that it is up- 
regulated during granulocytic differentiation. 31 A C/EBP e probe was 
not included in the oligonucleotide chips, and this mRNA was not 
detected by DD. Therefore, we examined the C/EBP e expression 
patterns by quantitative PCR and Northern blot analysis (Figure 4). 
C/EBP e exon 1 was PCR amplified from MPRO RNAs using primers 
RY48 (AGCCCCCG AC ACCCTTG ATG A) and RY49 (TGGCACACT- 
GCGGGCAG ACAG), 32 The results showed that C/EBP e is expressed 
throughout myeloid differentiation, with expression levels increased 
moderately in the later stages. 

We detected a number of other transcription factors that are 
broadly expressed or that have been reported in other studies of 
hematopoiesis (Table 5), Some of the factors that were most 
strongly induced during differentiation have been studied in other 
contexts but not previously implicated in hematopoiesis, such as a 
mammalian homologue to the Drosophila enhancer of split gene, a 
transcriptional silencer. The mammalian gene is expressed at 
relatively high levels as measured by the oligonucleotide chip and 



is a candidate for mediation of the silencing of growth-related 
genes in the maturing neutrophil. Another candidate transcriptional 
silencer, Tiflb, may serve as a corepressor for the KRAB domain 
family of zinc finger transcription factors and also may mediate 
binding of the heterochromatin protein HP 1 to DNA. 33 

There were 26 transcription factors whose mRNAs showed no 
significant changes by oligonucleotide chip analysis and were not 
identified as differentially regulated genes by differential display 
assays. PU.l, a factor necessary for the production of neutrophils 
and the expression of several neutrophil genes, 34 showed less than a 
3 -fold increase in mRNA, below the threshold for a significant 
change. Other candidate hematopoietic transcription factors, such 
as PEBPlaB2 (AML1), GATA-1, and SP-2, were represented on 
the oligonucleotide chips, but their mRNA levels were so low that 
they were reported as absent in this study. The possibility that small 
changes in the levels or ratios of some transcription factors could 
produce marked changes in transcription potentially limits the 
ability of data generated by present methods to explain transcrip- 
tional changes during differentiation. 

Protein expression patterns of MPRO cells during 
ATRA induction 

We visually compared the 2DE patterns from MPRO cells at the 
same time points used for mRNA analysis. In most cases the 
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Table 5. Transcription modulators presented during myeloid differentiation 



AO value by array 



Maximal fold change 



Gene symbol 



Gene accession 



Oh 



24,h 



48 h 



72 h 



Less than 2 -fold 



2 or more, less than 3 



3 or more, less than 4 



4 or more, less than 5 



5 or more 



Zfp11-6 

Btf3 

Gata2 

Hmgi 

Idb1 

Max 

Nfatc2 

Pm1 

Rarg 

Rela 

Sox 15 

Ybx1 

Zfp162 

Cebpd 

Idb2 

Jundl 

Lyl1 

Nfe2 

Nfkbl 

Pbx1 

sfpil 

Tiflb 

Trp53 

Usf2 

Ybx3 

Zfp216 

lrf1 

KW2 

Myb 

Stat3 

Tfdpl 

Cebpb 
StraU 

Cebpa 

Grg 

Mad 

Myc 

Etohi6 

TBX1 



AB020542 

W13502 

AB000096 

J04179 

M31885 

M63903 

AA560093 

U33526 

M34476 

M61909 

W53527 

M62867 

Y12838 

X61800 

M69293 

W29356 

X57687 

L09600 

L28117 

AF020196 

A34693 

U67303 

P10361 

U 12283 

L35549 

AA510137 

M21065 
U25096 
M 12848 
AA395029 
Q08639 

X62600 
Y07836 

M62362 
X73359 
X83106 
L00039 
W89667 
AA542220 



2630 
3 

562 
337 
455 
256 
2313 
173 
102 
297 
419 
643 
671 

157 
244 
1274 
399 
458 
953 
611 
375 
673 
259 
129 
96 
82 

85 
62 
892 
484 
307 

390 
223 

33 
99 
0 
314 
169 
0 



2989 
3 

770 
348 
787 
224 
3218 
281 
113 
260 
461 
489 
734 

262 
210 

2002 
342 
743 

2044 
303 
784 
659 
149 
185 
169 
151 

207 
86 
356 
1057 
560 

1248 
383 

212 
565 
111 
112 
386 
0 



2795 
2 

472 
177 
721 
312 
2396 
329 
114 
304 
484 
472 
720 

168 
310 
1434 
347 
1042 
1876 
345 
991 
420 
125 
285 
210 
204 

278 
246 
230 
1012 
505 

1380 
510 

182 
916 
167 
62 
313 
1 



2515 
1 

730 
232 
637 
172 
2542 
306 
218 
244 
837 
496 
992 

430 
604 

3085 
891 
505 

2034 
212 
529 
863 
361 
192 
119 
106 

198 
77 
435 
290 
1093 

1903 
936 

44 ' 
1005 
327 
173 
1003 
2 



Shown are the transcription factors identified as present by the oligonucleotide array analysis whose maximal AD between perfect match and mismatch oligonucleotide 
sets was greater than or equal to 200 U in this study. Data are presented as described in the legend to Table 3. 
AD indicates average difference; gene symbols are expanded in an Appendix at the end of this article. 



peptides identified for a given protein were derived from regions 
along the entire length of the protein, indicating the observed 
products were not the result of proteolytic degradation. These 
data must be considered with several caveats: membrane and 
other hydrophobic proteins and very basic proteins are not well 
displayed by the standard 2DE approach, and proteins present at 
low levels will be missed. 35 In addition, to simplify MS analysis, 
we used a Coomassie dye stain rather than silver to visualize 
proteins, and this decreased the sensitivity of detection of minor 
proteins. The MS method we used was sufficiently sensitive to 
identify proteins that could barely be visualized by colloidal 
blue staining. However, a limitation of the method for the mouse 
is that the current database lacks predicted amino acid sequences 
for a substantial fraction of murine genes. In addition, very 
small proteins give only a few peptides, making statistically 
confident identification difficult. 



Figure 5 shows the analytical colloidal blue-stained 2DE IPG 
reference maps of differentiated MPRO cells. Expression patterns 
of more than 500 protein spots were detected and observed through 
the entire series of gels. Protein spots could easily be cross- 
matched to each other, indicating the reproducibility of the method. 
As marked on the gel pictures (Figure 5), 50 proteins with a wide 
range of molecular weights (1 to 200 kd), isoelectric points (4 to 9), 
and abundances were subjected to MS protein identification. The 
results are presented in Table 6. 

Comparing the theoretical value of the molecular weight and pi 
of each protein to that of the observed value, we confidently 
identified 28 proteins in the expected position on the gels (spots 1 to 
28). Some of the other proteins with strong matches to the murine 
databases migrated to a somewhat unexpected pi position. Nine 
spots gave clear peptide peaks on mass spectroscopy but did not 
match any known gene. Their identification will require amino acid 
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Figure 5. 2DE electrophorotograms of MPRO cells. 

MPRO cell lysate (2.5 x 10 6 cell/sample) was loaded for 
2DE analysis. Gels were stained with brilliant blue G-col- 
loidal dye, (A) 2DE map of uninduced MPRO cell (0 hour). 
(B) 2DE map of matured MPRO cells (72 hours). Protein 
spots marked in the maps were considered differentially 
expressed and were subjected to MS analysis. The 
resultant protein information is listed in Table 6. 
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sequence analysis or availability of more extensive murine data- 
bases. We searched for the expression patterns of the genes cognate 
to the expressed proteins in dbMC (Table 6). Nineteen genes were 
found in dbMC, the mRNA for 5 genes was reported as absent, and 
13 genes were present during MPRO differentiation. Comparison 
of the expression patterns showed only 4 genes of 1 8 present on the 
oligonucleotide chips whose expression was consistent at the RNA 
level and protein level. None of these was on the list of the genes 

Table 6. Correlation of expression patterns between mRNA level and protein level 



that were differentially expressed significantly (5 -fold or greater 
change by array or 2-fold or greater change by DD). 



Discussion 

We explored the temporal patterns of gene expression during 
myeloid development. A database has been developed to provide a 
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The proteins listed here are represented by the spots marked in the electrophoretograms shown in Figure 5. 

Protein definition, Gi number, and predicted value refer to the protein name, accession number, and properties derived from the National Center for Biotechnology 
Information protein database. The column labeled % shows the percentage of peptides predicted from the protein sequence that were detected by mass spectroscopy. The 
expression level of protein spots expressed in mouse promyelocytic cell line cell induced by all- trans retinoic acid for 0 hours and 72 hours (Figure 5) were scored on a scale of 
1 ( + ) to 8 (+ + + + + 4- + +) in the 2DE pattern column. The cDNA expression patterns of the cognate mRNAsare listed in the cDNA expression pattern column abstracted from 
the dbMC database. The genes not represented on the oligonucleotide arrays were marked as N/A. Ag showed the correlation of gene patterns at mRNA level or protein level. 

Y indicates agreement and N discrepancy between changes in cDNA and protein spot intensity. The numbers in bold were obtained with DD. 2DE indicates 2-dimensional 
gel electrophoresis; IgE, immunoglobulin E; DD, differential display. 
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reference for later research on the molecular mechanisms underly- 
ing normal myeloid development. 

The MPRO cell system morphologically mimics normal myeloid 
differentiation and biochemically proceeds further toward mature neutro- 
phils than most other in vitro systems. Because the arrest in differentia- 
tion of MPRO cells growing in the absence of ATRA is not physiologic, 
there is a theoretical risk that gene expression in these cells is not 
coordinated in the way that it is in normal differentiation. It is 
encouraging that, for the most part, the timing of expression of genes for 
proteins of the various neutrophil granules is consistent with the timing 
of the morphologic and biochemical appearance of these granule 
components during normal myeloid differentiation. 

The DD technique provides certain advantages for detecting 
and comparing mRNA levels in different samples. First, the method 
is, in principle, similar to competitive RT-PCR, and, with the use of 
stringent PCR conditions, is expected to be about as reliable. 
Second, display patterns are reproducible. Third, the method 
detects the levels not only of RNAs already represented in the 
database but also of unknown RNA species that may represent 
"new" genes. Fourth, closely related genes can be distinguished 
regardless of cross-hybridization, provided there are some single 
nucleotide differences in the 3' end sequence. Limitations associ- 
ated with this technique are that numerous gels are necessary to get 
complete information and that comparison of the levels of different 
mRNAs is only approximate because of the differential amplifica- 
tion of bands of different size or sequence. 

Oligonucleotide chip analysis is a fast and effective means of 
accessing mRNA expression patterns. 20 Cluster analysis of groups 
of samples by this approach is effective. However, the present 
results indicate that alternative methods of verification are desir- 
able before the data on an unexpected change in a particular gene 
are definitively accepted. 

To obtain the broadest range of information from the myeloid 
differentiation process, both differential display and oligonucleotide 
chip techniques were applied in the current study. As a result, 65.3% of 
the observed changes in mRNA levels came from the differential display 
method and 4 1 .5% came from oligonucleotide chip assays. 

Our data showed in general that changes in expression pattern 
by the 2 methods agreed qualitatively but that there was some 
quantitative variation. Our results indicate that DD may be a more 
accurate way to detect changes in levels of gene expression than the 
oligonucleotide chip assay. However, improvements in the types of 
oligonucleotides used in arrays may close this gap in the future. 

The mRNAs for a limited number of transcription factors vary in a 
pattern correlating with that of the mRNAs for primary or secondary 
granule proteins. However, more detailed information is needed, and the 
underlying mechanisms of granule gene regulation remain unclear. The 
number of potential positive and negative regulatory factors found here 
is sufficiently small as to make it feasible to perform in vivo studies, 
such as chromatin immunoprecipitation. 

The oligonucleotide chip used in this study focused on known 
genes, whereas the DD method samples all polyadenylated tran- 
scripts. The latter method generated a large number of products not 
associated with known genes, in part because the mouse genome is 
not as well represented in the database as the human genome. 
However, our experience with DD and human mRNAs indicates 
that substantial fractions of the products represented as ESTs or not 
represented at all in the public databases are cDNA copies from 
introns, hnRNA, or other RNA with internal A runs. 

Approximately 59 sequences obtained from gel-display bands 
had significant changes in the level of expression and a sequence 
that did not match that for any named gene in the public databases. 



Of these, 38 had plausible or excellent polyA signals. This is only 
an approximate estimate of the number of new genes found 36 
because a fraction of the mRNAs for known genes still had poor 
polyA signals. In addition, the full 3' untranslated region is often 
not known for characterized genes, and in some cases these new 
genes may prove to be identical to products identified by the 
oligonucleotide chips when more complete sequences are obtained. 
At the least, their presence indicates that a substantial fraction of 
the regulatory or functional circuitry of maturing myeloid cells 
remains unexplored and that valuable tools for their investigation 
will emerge from a combination of RNA expression studies and 
analysis of emerging genomic sequences. 

The desired end point for the description of gene expression in a 
biologic system is not only the analysis of mRNA transcript levels 
but also the accurate measurement of protein abundance. The 
developments in 2DE and new MS instrumentation make it 
possible to accomplish this work rapidly and efficiently. In this 
study, we attempted to identify a number of the proteins differen- 
tially expressed between uninduced and ATRA-differentiated MPRO 
cells and to examine the relation between mRNA and protein expression 
levels for these genes representing the same state. 

For protein levels based on estimated intensity of Coomassie dye 
staining in 2DE, there was poor correlation between changes in mRNA 
levels and estimated protein levels. Other groups have studied the 
correlation between mRNA and protein levels in yeast and liver 
cells. 1112 - 14 In the liver cell experiments, 1112 correlation coefficients of 
0.4 to less than 0.5 were observed In an extensive study in yeast, 1112 the 
correlation coefficient was high if the most abundant mRNAs and 
proteins were considered If a handful of these products was omitted, the 
remaining correlation coefficient was 0.4 or less. However, one 
could restore some of the correlation by averaging individual 
data points into broad proteomic categories. 37 

The discrepancies between mRNA and protein levels in MPRO cells 
appear to be substantially larger than those obsen'ed for yeast. Possible 
causes for the discrepancies include translational regulation, differential 
expression of certain mRNAs at various stages of cell growth in vitro, 
post-translational protein modification that varies with the stage of 
maturation of the cells, and selective degradation or excretion of proteins 
in vivo. Furthermore, here we are focusing on a developmental 
time-course, whereas the yeast study concentrated on the organism in 
vegetative growth. New techniques, equipment, and bioinformatic 
analysis tools must be developed to make such systematic, global, and 
quantitative analyses feasible. 

The initial studies of protein expression presented here provide a 
cautionary note for efforts to interpret cell composition and function in 
relation to mRNA levels. Discrepancies we observed between gene 
expression and protein abundance suggest that selective post-transcrip- 
tional controls may be at least as important as changes in mRNA levels 
in determining the protein composition of neutrophils and that they are 
phenomena less well explored than transcriptional control. Analysis of 
mRNA expression patterns is itself only a small beginning toward a 
genome- wide description of cellular components. 
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Appendix 

Gene symbols used in tables: Actb: actin, beta, cytoplasmic; Actg: actin, gamma, 
cytoplasmic; Actx: melanoma X-actin; Aldol: aldolase 1, A isoform; Arf5: 
ADP-ribosylation factor 5; Atfl : activating transcription factor 1 ; Atf2: activating 
transcription factor 2; BtO: basic transcription factor 3a; Bzrp: peripheral-type 
benzodiazepine receptor, C5rl : complement component 5, receptor 1/G protein- 
coupled receptor (C5a); Ccnb2; cyclin B2; Cd36l2; CD36 antigen (collagen type 
I receptor, thrombospondin receptor)-like 2; Cd53: CD53 antigen; Cebpa: 
CCAAT/enhancer binding protein C/EBP, alpha; Cebpb: CCAAT/enhancer 
binding, protein (C/EBP), beta; Cebpd: CCAAT/enhancer binding protein (C/ 
EBP), delta; Cebpe: CCAAT/enhancer binding protein (C/EBP), epsilon; Cfll: 
cofilin 1 , nonmuscle; Cmkar4: chemokine (C-X-C) receptor 4; Cmkbrl : chemo- 
kine (C-C) receptor 1/Mipla receptor; Cnlp: cathelin-like protein; Cntf: ciliary 
neurotropic factor/zinc finger protein PZF; Copa: coatomer protein complex 
subunit alpha; Cpa3: carboxypeptidase A3, mast cell; Cr2: complement receptor 
2; Crhn corticotropin releasing hormone receptor; Crry: complement receptor- 
related protein; Csflr: CSF 1 (M-CSF) receptor/c-fms/CD115; Csf2ra: CSF 2 
(GM-C'SF) receptor, alpha, low-afrtnity/CD116; Csf2rbl: CSF 2 (GM-CSF) 
receptor, beta 2, low-affinity/lL 3 receptor-like protein (AlC2B)/CDwl31; 



Csf2rb2: CSF 2 (GM-CSF) receptor, beta 2, low-affinity/I L-3 receptor (AJC2A); 
Ctsb: cathepsin B; Ctsc: cathepsin C; Ctsd: cathepsin D; Ctse: cathepsin E; Ctsg: 
cathepsin G; Ctsh: cathepsin H; Ctsl: cathepsin L; Ctss: cathepsin S; Cybb: 
cytochrome b-245, beta; Drd2: dopamine receptor 2; E2fl : E2F transcription 
factor 1; Ear2: eosinophil-associated ribonuclease 2; Ebi3: Epstein-Barr virus- 
induced gene 3/cytokine receptor-like molecule (EBI3); E12: Balb/c neutrophil 
elastase; Ela2: elastase 2; Erh: enhancer of rudimentary homolog (Drosophila); 
Etohi6: ethanol induced 6/sterol regulatory element binding transcription factor 1 
(SREBF1) homolog; F2rl2: coagulation factor II (thrombin) receptor-like 2; 
Fcerlg: Fc receptor, IgE, high affinity I, gamma polypeptide; Fcgr2b: Fc receptor, 
IgG, low affinity lib; Fcgr3: Fc receptor, IgG, low affinity III; Fprl: formyl 
peptide receptor 1/fMLP receptor; Gabpbl : GA repeat binding protein (GABP- 
betal subunit); Gata2: GATA-binding protein 2; Gnas: guanine nucleotide 
binding protein, alpha stimulating; Gnb2-rsl : guanine nucleotide binding protein, 
beta-2, related sequence 1; Gpx3: glutathione peroxidase 3; Grg: related to 
Drosophila groucho gene; Gridl: glutamate receptor channel subunit delta 1; 
Grn: granulin; Gstml: glutathione-S-transferase, mu 1; Gus-s: beta-glucuroni- 
dase structural; Gys3: glycogen synthase 3, brain; H2-D: histocompatibility 2, D 
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region locus 1; Hist2: histone gene complex 2; Hist5-2ax: H2Ahistone family, 
member X; Hmgi: high mobility group protein I; Hsp60: heat shock protein, 60 
kDa; Htr5a: 5-hydroxytryptamine (serotonin) receptor 5 A; Idbl: inhibitor of 
DNA binding 1 /helix-loop-helix DNA binding protein regulator (Id); Idb2: 
inhibitor of DNA binding 2; Ifhgr: interferon gamma receptor; Ifhgr2: interferon 
gamma receptor 2; Ii: la-associated invariant chain; Ilia: IL1 alpha; Illr2: IL1 
receptor, type II; I12rg: IL2 receptor, gamma chain; 114ra: IL4 receptor, alpha; 
111 Orb: ILIO receptor, beta; I117n 1L1 7 receptor, Irfl: interferon regulatory factor 
1 ; Irf2: interferon regulatory factor-2; Itgb2: integrin beta 2 (Cdl 8); Itpr5: inositol 
1,4,5-trisphosphate receptor (type 2); Jundl: Jun proto-oncogene-related gene 
dl /transcription factor JUN-D; KI12: Kruppel-like factor LKLF; L-CCR; lipopoly- 
saccharide inducible C-C chemokine receptor-related; Lcn2: lipocalin 2; Ldlr 
low density lipoprotein receptor; Lspl: Lymphocyte-specific l/S37/pp52; Lstl: 
leucocyte-specific transcript 1 ; Ltb4r: leukotriene B4 receptor, Ltbr: lymphotoxin- 
beta receptor, Ltf: lactotransferrin; Ly64: lymphocyte antigen 64; Ly6e: lympho- 
cyte antigen 6 complex, locus E; Lyll : lymphoblastoma leukemia/bHLH factor, 
Lyzs: lysozyme; M6pr. mannose-6-phosphate receptor, cation dependent; Mad: 
Max dimerization protein; Man2cl: mannosidase, alpha, class 2C, member 1; 
Max: Max protein; Maz: MYC-associated zinc ringer protein (purine-binding 
transcription factor); MBP: eosinophil granule major basic protein precursor; 
Mcpt8: mast cell protease 8; Mil: myeloid/lymphoid or mixed-lineage leukemia; 
Mmpl3: matrix metalloprotetnase 13/collagenase; Mmp9: matrix metalloprotein- 
ase 9/gelatinase B; Mpo: myeloperoxidase; Myb: myeloblastosis oncogene; 
Mybl2: myeloblastosis oncogene-like 2; Myc: myelocytomatosis oncogene; 
Myln: myosin light chain, alkali, nonmuscle; Nfatc2: nuclear factor of activated T 
cells, cytoplasmic 2; Nfe2: nuclear factor, erythroid-derived 2, 45 kDa; Nfkbl: 
NF-kappa-B (pi 05); Ngp: neutrophilic granule protein; NMDRGB: N-methyl-D- 
aspartate receptor glutamate-binding chain homolog; Npml: nucleophosmin 1; 
Nr4al: nuclear receptor subfamily 4, group A, member 1; Osi: oxidative stress 
induced; P2rxl: purinergic receptor P2X, ligand-gated ion channel, 1; P2ry2: 
purinergic receptor P2Y, G-protein-coupled 2; P40-8: P40-8, functional/laminin 
receptor; Pbxl: pre B-cell leukemia transcription factor 1; Pfc: properdin factor, 
complement; Ptrai: paired-Ig-like receptor Al; Pira5: paired-tg-like receptor 



A5; Pira6: paired-Ig-like receptor A6; Pirb: paired-Ig-like receptor B; Plaur 
urokinase plasminogen activator receptor, PMI: putative receptor protein (SP: 
PI 7 152 ); Pml: promyelocytic leukemia; Prg: proteoglycan, secretory granule; 
Prg3: proteoglycan 3/eosinophil major basic protein 2; Prtn3: proteinase 3; 
Psma2: proteasome (prosome, macropain) subunit, alpha type 2; Ptmb4: prothy- 
mosin beta 4; Ptprc: protein tyrosine phosphatase, receptor type, C; Rac2: 
RAS-related C3 botulinum substrate 2; Rarg: retinoic acid receptor, gamma; 
Rela: avian reticuloendotheliosis viral (v-rel) oncogene homolog A/NF-kappa-B 
p65; Rpll9: ribosomat protein LI 9; RPL8: ribosomal protein L8; Rps6kal: 
ribosomal protein S6 kinase polypeptide 1; Rps8: ribosomal protein S8; Rtn3: 
reticulon 3; S100a8: SI 00 calcium binding protein A8 (calgranulin A); S100a9: 
SI 00 calcium-binding protein A9 (calgranulin B); Sdfr2: stromal cell-derived 
factor receptor 2; Sell: selectin L (lymphocyte adhesion molecule 1); Sema4d: 
semaphorin 4D; Seppl: selenoprotein P, plasma, I; Sfpil: SFFV proviral 
integration 1; Shfdgl: split hand/foot deleted gene 1; SIclOal: solute carrier 
family 10 (sodium/bile acid cotransporter family), member 1; Slpi: secretory 
leukocyte protease inhibitor; Sox 15: SRY-box containing gene 15; Spi2~-1 : serine 
protease inhibitor 2-1 ; Srbl: scavenger receptor class B 1 ; Stat3: signal transducer 
and activator of transcription 3; Stat5a: signal transducer and activator of 
transcription 5 A; Stat6: signal transducer and activator of transcription 6; Stral4: 
basic-helix-loop-helix protein-retinoic acid induced; Tbxl: TBX1 protein/LPS- 
induced TNF-alpha factor homolog; Tcrgb: T-cell-receptor germline beta-chain 
gene constant region; Tcrg-V4: T-cell-receptor gamma, variable 4; Tctexl: 
t-complex testis expressed 1; Tfdpl: transcription factor Dp 1; Tiflb: transcrip- 
tional intermediary factor 1, beta; Tlr4: toll-like receptor 4; Tnirsfla: TNF 
receptor superfamily, member la; Tnfrsflb: TNF superfamily, member lb; 
Tomm70a: translocase of outer mitochondrial membrane 70 (yeast) homolog A; 
Tpi: triosephosphate isomerase; Trp53: transformation-related protein 53; Ubb: 
ubiquitin B; Usf2: upstream transcription factor 2; Ybxl: Y box transcription 
factor; Ybx3: Y box binding protein; Zfpl 1-6: zinc finger protein si 1-6; Zfpl 8: 
zinc finger protein 1 8 homolog; Zfp36: zinc finger protein 36; Zfpl 62: zinc finger 
protein 162; Zfp216: zinc finger protein 216; Zfpml: zinc finger protein, 
multitype 1 ; Znfhlal : zinc finger protein, subfamily 1 A, 1 (Ikaros); Zyx: zyxin. 



Opinion 

Comparing protein abundance and mRNA expression levels on a 
genomic scale 

Dov Greenbaum*, Christopher Colangelo**, Kenneth Williams** and 
Mark Gerstein t§ 

Addresses: *Department of Genetics, department of Molecular Biophysics and Biochemistry, *HHMI Biopolymer Laboratory and W. M. Keck 
Foundation Biotechnology Resource Laboratory, and ^Department of Computer Science, Yale University, New Haven, CT 06520-8114, USA. 

Correspondence: Mark Gerstein. E-mail: Mark.Gerstein@yale.edu. Kenneth Williams. E-mail: Kenneth.Williams@yale.edu 

Published: 29 August 2003 
Genome Biology 2003, 4:117 

The electronic version of this article is the complete one and can be 
found online at http://genomebiology.eom/2003/4/9/l 17 

© 2003 BioMed Central Ltd 



Abstract 

Attempts to correlate protein abundance with mRNA expression levels have had variable success. 
We review the results of these comparisons, focusing on yeast. In the process, we survey experimen- 
tal techniques for determining protein abundance, principally two-dimensional gel electrophoresis and 
mass-spectrometry. We also merge many of the available yeast protein-abundance datasets, using the 
resulting larger 'meta-dataset' to find correlations between protein and mRNA expression, both 
globally and within smaller categories. 



Although some of the underlying technology for quantifying 
protein abundance was introduced almost thirty years ago 
[1,2], there has recently been a significant increase in the 
development of new tools. Concurrently, tools for analyzing 
mRNA expression are becoming more mainstream. The 
quantification of both of these molecular populations is not 
an exercise in redundancy; measurements taken from 
mRNA and protein levels are complementary and both are 
necessary for a complete understanding of how the cell 
works [3]. Additionally, as mRNA is eventually translated 
into protein, one might assume that there should be some 
sort of correlation between the level of mRNA and that of 
protein. Alternatively, there may not be any significant cor- 
relation, which, in itself, is an informative conclusion. 

The two commonly used high-throughput methods for mea- 
suring mRNA expression, microarrays and Affymetrix chips, 
have both been extensively reviewed elsewhere [4-6]. There 
are also two basic methods for determining protein abun- 
dance; either based on two-dimensional electrophoresis or on 
mass-spectrometric methods (Table 1). We provide a brief 
review of these technologies and recent efforts to determine 



correlations between quantified protein abundances and 
mRNA expression. 

Methods for determining protein levels 
Two-dimensional electrophoresis 

Determining relative protein expression levels by conven- 
tional two-dimensional electrophoresis requires isoelectric 
focusing, SDS-polyacrylamide gel electrophoresis, staining, 
fixing, densitometry, and careful matching of the same spots 
on two or more gels. Differentially expressed spots are then 
excised and enzymatically digested, and the resulting pep- 
tides are identified using mass spectrometry. An attractive 
aspect of this approach is the low capital equipment cost, but 
a high level of expertise is needed to obtain reproducible 
gels, and two-dimensional electrophoresis is generally 
limited to proteins that are neither too acidic, too basic, nor 
too hydrophobic, and that are between 10 and 200 kDa in 
size, so that they are reliably separated on gels. Additionally, 
this approach detects only those proteins that are expressed 
at relatively high levels and that have long half-lives [7,8]. In 
one study using 40 jig yeast lysate, the average protein 
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Table I 



Overview of selected protein profiling technologies 



Technology 


Type of labeling 
required 


Ability to detect many 

post-translational 

modifications 


Biomolecules that are 
optimally quantified 


Approximate 
dynamic range 
(and reference) 


Number of 
proteins/spots 
quantified 
(and reference) 


Two-dimensional gel 
electrophoresis 


Silver staining 


Yes 


Naturally occurring forms 
of proteins larger than 1 0 kDa 


10 [9] 


1.500 [8] 


Differential two- 
dimensional fluorescence 
gel electrophoresis (DIGE) 


In vitro with Cy2, Cy3 
or CY5 fluorophores 
at primary amines 


Yes 


Naturally occurring forms 
of proteins larger than 1 0 kDa 


10.000 [9] 


IJOO [51] 


SELDI- or MALDI-MS 
disease biomarker discovery 


None 


Yes 


Naturally occurring forms 

of proteins smaller than 10 kDa 


25 


Not applicable 


Isotope-coded affinity 
tag (ICAT) - LC/MS 


In vitro with H'/D or 
C I2 /C I3 ICAT reagent 
at cysteine 


No 


Cysteine-containing tryptic 
peptides from digests of 
protein extracts 


10.000* 


496 [18] 


N'</N IS - LC/MS 


In vivo at nitrogens 
in amino acids 


Yes 


Tryptic peptides from digests 
of protein extracts 


10.000 [19] 


872 [20] 



^Assumed to be similar to that for multidimensional protein identification. Abbreviations: SELDI-MS, surface-enhanced laser desorption ionization mass 
spectrometry; MALDI-MS, matrix-assisted laser desorption ionization mass spectrometry; LC/MS, liquid chromatography and mass spectrometry. 



abundance detected was 51,200 copies per cell, with no pro- 
teins detected with abundances less than 1,000 copies per 
cell [8]. Given that 1,500 spots were resolved on a 1.0 pH 
unit gel [8], several gels covering different pH ranges would 
be needed to resolve a whole cell lysate. Given these limita- 
tions, conventional two-dimensional electrophoresis tech^ 
nology has limited potential for large-scale proteome 
analysis [8]. 

Two-dimensional fluorescence-difference gel electrophoresis 
(DIGE) utilizes mass- and charge-matched, spectrally 
resolvable fluorescent dyes (such as Cy3 and Cys) to label 
two different protein samples in vitro prior to two-dimen- 
sional electrophoresis. Its main advantage over conventional 
two-dimensional electrophoresis is that both the control and 
the experimental sample are run in a single polyacrylamide 
gel. The samples are then imaged separately but can be per- 
fectly overlaid without any 'warping' of the gels. This sub- 
stantially raises the confidence with which protein changes 
between samples can be detected and quantified. Changes in 
the relative level of expression of a protein may be detected 
that are as little as 1.2-fold for large-volume spots [9]. 
Because detection is based on fluorescence, DIGE has a large 
dynamic range of about 10,000, which permits differential 
expression analysis of proteins that are present at relatively 
low copy number [9]. The limit of detection of DIGE for 
quantifying protein expression ratios is between 0.25 and 
0-95 ng protein, which is similar to that for silver staining 
[9,10]. In a recent study [11], the relative levels of expression 
of approximately 1,050 protein spots were compared in 
250,000 laser-dissected normal versus esophageal carci- 
noma cells. This analysis identified 58 spots that were 



up-regulated by more than three-fold and 107 that were 
down-regulated by more than three-fold in cancer cells. 

Mass spectrometric approaches 

Disease biomarker discovery 

Current approaches to discovering protein or peptide 
markers of disease involve batch chromatography, matrix- 
assisted laser desorption ionization mass spectrometry 
(MALDI-MS) and statistical analysis of large numbers of 
disease versus normal serum or other biological samples. 
Most recent studies have relied on surface-enhanced laser 
desorption ionization time-of-flight mass spectrometry 
(SELDI-TOF-MS) [12,13]. The SELDI approach [13] involves 
using a gold-coated chip with eight or sixteen 2 mm spots 
that are modified with chromatographic surfaces (for 
example anionic, cationic, hydrophobic, and so on). After 
spotting a few microliters of serum, any contaminants and 
salt are removed by washing with water, and the target is 
dried by adding a MALDI matrix solution, such as oc-cyano-4- 
hydroxy-cinnamic acid. In a study by Petricoin et al [14] 
SELDI-MS analysis of serum from 50 control and 50 case 
samples from patients with ovarian cancer resulted in identi- 
fying five peptide biomarkers that ranged in size from 534 to 
2,465 Da. The pattern formed by these markers was then 
used to correctly classify all 50 ovarian cancer samples in a 
masked set of serum samples from 116 patients who included 
50 patients with ovarian cancer and 66 unaffected women. 
Similar promising results have been reported in studies of 
serum samples from breast and prostrate cancer patients 
[12,15]. I n a recent study [16], which compared the relative 
ability of several different statistical approaches to classify 
samples based on MS data, the disease biomarker approach 
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was extended to a conventional MALDI-MS platform. 
Although powerful, the disease biomarker approach does not 
provide accurate relative amounts of the control versus experi- 
mental biomarker, only the relative intensity difference. 

lsotope<oded affinity-tag-based protein profiling 
While both MALDI-MS-based disease biomarker discovery 
and DIGE comparatively profile the naturally occurring 
forms of peptides and proteins, isotope-coded affinity-tag 
(ICAT) analysis profiles the relative amounts of cysteine- 
containing peptides derived from tryptic digests of protein 
extracts. Because only a single tryptic peptide is needed to 
quantify the expression of the corresponding parent 
protein, the ICAT reagent utilizes a thiol protein-reactive 
group that attaches both a biotin tag and either nine 12 C 
(light) or nine J 3C (heavy) atoms to each cysteine residue. 
Following derivatization of the control protein extract with 
[ 12 C]-ICAT reagent and the experimental extract with [^C]- 
ICAT reagent, the pooled samples are subjected to trypsin 
digestion followed by both cation and avidin chromatography. 
Liquid chromatography and tandem mass spectrometry 
(LC/MS/MS) is then used to identify ICAT peptide pairs 
and to quantify the relative 12 C/^C ratios. It is important to 
note that the ICAT approach provides the relative expres- 
sion ratios of individual proteins under two conditions; it 
does not provide absolute protein concentrations, nor does 
it provide the ratio of the concentration of one protein rela- 
tive to another in a single condition. A nice feature of this 
approach is that the in vitro incorporation of a stable 
isotope into one of the two samples being compared obvi- 
ates the need to separately analyze the control and experi- 
mental samples by MS. Although a tryptic digest of a 
whole-cell human protein extract might produce more than 
500,000 peptides, less than 100,000 of these might be 
expected to contain cysteine, but based on a search of the 
SwissProt database [17], less than 5% of human proteins 
lack cysteine and would therefore be missed (that is, more 
than 95% of proteins would include at least one cysteine- 
containing peptide). 

ICAT results are analogous to those obtained by the use of 
two different fluorescent dyes in DNA microarray analysis of 
mRNA levels or DIGE analysis of protein expression. The 
largest number of proteins profiled so far using this 
approach with a single sample are the 491 proteins con- 
tained in microsomal fractions of naive and in vitro differen- 
tiated human myeloid leukemia cells [18]. 

Multidimensional protein identification technology 
Multidimensional protein identification technology 
(MudPit) is similar to ICAT in that it utilizes cation- 
exchange prefractionation followed by reverse-phase (RP) 
high-performance liquid chromatography (HPLC) separa- 
tion and MS/MS analysis [19]. In contrast to the ICAT 
approach, however, MudPit analyzes the entire mixture of 
tryptically digested proteins and utilizes tandemly coupled 



(cation-exchange followed by reverse-phase) columns. A 
specific subset of peptides is eluted from the cation- 
exchange column, using a step gradient of increasing salt 
concentration, onto the front of the RP column. Peptides are 
then eluted from the RP column and enter the mass spec- 
trometer for analysis. After the RP gradient is complete, the 
next step of the salt gradient releases another subset of pep- 
tides from the cation -exchange column onto the RP column, 
and the process repeats itself. Using this approach on the 
yeast proteome, Wolters et ah [19] identified 5,540 unique 
peptides from 1,484 proteins and demonstrated a dynamic 
range of detection of 10, 000 -fold. This method has been 
extended to comparative protein profiling by using in vivo 
^N/^N metabolic labeling [20,21]. Washburn et al [20] 
used Saccharomyces cerevisiae grown in both KN- and ^N- 
containing minimal media, and 2,264 peptides and 872 pro- 
teins were uniquely identified. Also, accurate ^N/^N 
quantitation was determined for each peptide with an 
average standard deviation of 30%. 

Comparison of mRNA and protein levels 

Even with the significant developments in the technologies 
used to quantify protein abundance over the past couple of 
years, protein identification and quantification still lags 
behind the high-throughput experimental techniques used 
to determine mRNA expression levels. Yet, while mRNA 
expression values have shown their usefulness in a broad 
range of applications, including the diagnosis and classifica- 
tion of cancers [22,23], these results are almost certainly 
only correlative, rather than causative; in the end it is most 
probably the concentration of proteins and their interactions 
that are the true causative forces in the cell, and it is the cor- 
responding protein quantities that we ought to be studying. 

Primarily because of a limited ability to measure protein 
abundances, researchers have tried to find correlations 
between mRNA and the limited protein expression data, in 
the hope that they could determine protein abundance 
levels from the more copious and technically easier mRNA 
experiments. Alternatively, if there is definitively no corre- 
lation between mRNA and protein data, both quantities 
could be used as independent sources of information for use 
in machine-learning algorithms, for example, to predict 
protein interactions. To date, there have been only a 
handful of efforts to find correlations between mRNA and 
protein expression levels, most notably in human cancers 
and yeast cells; for the most part, they have reported only 
minimal and/or limited correlations. 

One of the earliest analyses of correlation looked at 19 pro- 
teins in the human liver. Anderson and Seilhamer [24] 
found a somewhat positive correlation of 0.48. Another 
limited analysis, of the three genes MMP-2, MMP-g and 
TIMP-i in human prostate cancers, showed no significant 
relationship [25]. An additional cancer study [26] showed a 
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significant correlation in only a small subset of the proteins 
studied. Conversely, Orntoft et al [27] found highly signifi- 
cant correlations in human carcinomas when looking at 
changes in mRNA and protein expression levels. 

Protein and mRNA correlations in yeast 

Many of the present efforts at correlating mRNA and protein 
expression have been conducted in yeast using two-dimen- 
sional electrophoresis techniques. In particular, Gygi et al 
[7] found that even similar mRNA expression levels could be 
accompanied by a wide range (up to 20-fold difference) of 
protein abundance levels, and vice versa. These results con- 
trast with those of Futcher et al [28], who found relatively 
high correlations (r = 0.76) after transforming the data to 
normal distributions. In a previous analysis [29], we merged 
the data from both of these datasets (referred to as 2DE-1 [7] 
and 2DE-2 [28]), comparing the resulting new larger protein 
abundance set ('merged data-set 1') with a comprehensive 
mRNA expression dataset The mRNA expression reference 
set was constructed through iteratively combining, in a non- 
trivial fashion, three sets that used Affymetrix chips and a 
SAGE dataset [29]. Using these. reference datasets, we were 
able to do an all-against-all comparison of mRNA and 
protein expression levels, in addition to a number of analy- 
ses comparing protein and mRNA expression using smaller, 
but broad categories [29,30]. 

Given the difficult, laborious, and limiting nature of two- 
dimensional electrophoresis analysis, many of the newer 
protein abundance determinations have been done using 
MudPit and derivative technologies. Washburn et al [31] 
used MudPit to analyze and detect 1,484 arbitrary proteins: 
they were able to detect a somewhat random sampling of 
proteins independent of abundance, localization, size or 
hydrophobicity (we refer to this dataset as MudPit-i). In a 
further experiment, the authors, comparing expression ratios 
for both proteins and mRNA levels, found that although they 
could not find correlations for individual loci, they could find 
overall correlations when looking at pathways and com- 
plexes of proteins that functioned together [21]. Peng et al 
[32] analyzed 1,504 yeast proteins with a false-positive rate - 
misidentification of a protein - of less than l% (we refer to 
this dataset as MudPit-2). In their analysis [32], they con- 
trasted their methodology with that of Washburn et al [31] 
with which there was significant overlap of proteins. 

A new merged dataset 

Expanding upon our previous merged dataset, we con- 
structed a new merged dataset (merged data set-2) using the 
two two-dimensional electrophoresis and two MudPit 
datasets described above. Succinctly (more information is 
available on our website at [33]), we transformed each of the 
protein-abundance datasets into more quantitative data by 
fitting each protein dataset individually onto the reference 
mRNA expression dataset. The MudPit-i dataset was also 
fitted onto the more finely grained MudPit-2 dataset. Each 



of the new, fitted datasets was then inversely transformed 
back into protein space. These derived protein datasets were 
then combined into a larger reference dataset; when we had 
more than one abundance value for an open reading frame 
(ORF), we chose the value from the dataset according to a 
prescribed quality ranking (see Figure 1). The resulting set 
contained protein abundance information for approxi- 
mately 2,000 ORFs. (One caveat with the MudPit data: 
while quantitative analysis can be subsequently done on the 
results of MudPit experiments, MudPit data alone are only 
semi-quantitative, in that the number of peptides deter- 
mined is relative to the actual protein abundance within the 
cell [31]. Some may therefore argue that MudPit alone is not 
optimal for a comparison with mRNA data. Nevertheless, 
we feel that our methodical merging process creates a quan- 
titative and representative dataset that can be compared 
with the mRNA expression data.) Using the resulting data 
we could compare mRNA expression and protein abundance 
globally (Figure la) as well as looking at smaller, broad cate- 
gories, such as function or localization (see Figure ib,c). In 
particular, we show that some localization categories - for 
example, the nucleolus - have significantly higher correlations 
than the global correlation. Other localizations may present 
less of a correlation between mRNA and protein data - for 
example, the mitochondria - possibly reflecting the heteroge- 
neous nature and function of the latter organelle. In terms of 
MIPS functional categories [34,35], we show that although 
some categories, such as cell rescue, show a lower correlation 
than the whole merged set, other functional categories, such 
as cell cycle, show a significant increase in correlation. Logi- 
cally, this increased correlation reflects the co-regulated 
nature of the proteins in this functional category. 

Reasons for the absence of correlation 

There are presumably at least three reasons for the poor 
correlations generally reported in the literature between the 
level of mRNA and the level of protein, and these may not be 
mutually exclusive. First, there are many complicated and 
varied post-transcriptional mechanisms involved in turning 
mRNA into protein that are not yet sufficiently well defined 
to be able to compute protein concentrations from mRNA; 
second, proteins may differ substantially in their in vivo half 
lives; and/or third, there is a significant amount of error and 
noise in both protein and mRNA experiments that limit our 
ability to get a clear picture [36,37]. 

Examining the first option - that there are a number of 
complex steps between transcription and translation - we 
looked at correlations between mRNA and protein abun- 
dance for those ORFs that had varied or steady levels of 
mRNA expression over the course of the cell cycle [38]. To 
normalize for the varied degrees of expression for different 
ORFs, we took the standard deviation divided by the average 
expression level as representative of the variation of each 
ORF over the course of the yeast cell cycle (Figure 2). 
Broadly speaking, the cell can control the levels of protein at 



Genome Biology 2003, 4:1 17 



http://genomebiology.eom/2003/4/9/ 1 1 7 



Genome Biology 2003, Volume 4, Issue 9, Article 1 17 Greenbaum et oi. I 



the transcriptional level and/or at the translational level. 
Logically, we would assume that those ORFs that show a 
large degree of variation in their expression are controlled at 
the transcriptional level - the variability of the mRNA 
expression is indicative of the cell controlling mRNA expres- 
sion at different points of the cell cycle to achieve the result- 
ing and desired protein levels. Thus we would expect, and we 
found, a high degree of correlation (r = 0.89) between the 
reference mRNA and protein levels for these particular 
ORFs; the cell has already put significant energy into dictating 



the final level of protein through tightly controlling the 
mRNA expression, and we assume that there would then be 
minimal control at the protein level. In contrast, those genes 
that show minimal variation in their mRNA expression 
throughout the cell cycle are more likely to have little or no 
correlation with the final protein level; the cell would be con- 
trolling these ORFs at the translational and/or* post-transla- 
tional level, with the mRNA levels being somewhat 
independent of the final protein concentration. And indeed, 
we found only minimal correlation between protein and 
mRNA expression for these ORFs (r = 0.2). 

Furthermore, we found that those ORFs that have higher 
than average levels of ribosomal occupancy - that is that a 
large percentage of their cellular mRNA concentration is 
associated with ribosomes (being translated) - have well cor- 
related mRNA and protein expression levels (Figure 2). 
These cases probably represent a situation wherein the cell, 
having significantly controlled the mRNA expression to 
produce a specific level of protein, will probably not also 
employ mechanisms to control the translation. Alternatively, 
those proteins that have very low occupancy rates have 
uncorrected mRNA and protein expression; thus, given that 
the cell has not tightly controlled the mRNA expression for 
this ORF, it will dictate the resulting protein levels through 
rigorous controls of its translation (that is, through tight 
limits on occupancy) [39]. 

A second option for a general lack of correlation between 
mRNA and protein abundance may be that proteins have very 
different half-lives as the result of varied protein synthesis and 
degradation. Protein turnover can vary significantly depending 
on a number of different conditions [40]; the cell can control 



Figure I 

Comparison of mRNA expression and protein abundance, (a) A plot 
comparing our mRNA reference expression set [29] with our newly 
compiled protein abundance dataset. The mRNA axis is in copies per cell; 
the protein axis is in thousand copies per cell. The protein dataset is the 
result of iteratively fitting two MudPit datasets (MudPit-l [32] and MudPit-2 
[3 1 ]) and two two-dimensional electrophoresis datasets (2DE- 1 [7] and 
2DE-2 [28]). Given the semi-quantitative nature of the MudPit data [31], we 
transformed the data into a more quantitative set by fitting each set 
individually onto our reference mRNA expression dataset In addition, we 
fit the MudPit- 1 dataset onto the more finely-grained MudPit-2 dataset. 
Each of the datasets was then moved back into 'protein space' using an 
inverse transformation derived from the 2DE- 1 set, as this set has the most 
precise values. These datasets were then combined into the new reference 
abundance dataset. In cases in which there were overlapping values for a 
given ORF we used the dataset in accord with the following ordering: 2DE- 
I, 2DE-2, MudPit-2, MudPit- 1. The resulting reference protein abundance 
dataset (N = 2044) had a correlation of 0.66 with the mRNA reference 
dataset. (b,c) Additionally, we show that when looking at specific subsets 
(subcellular localization [52] or functional groups [34,35]) we can find both 
higher and lower correlations amongst these groups. The lower 
correlations are generally reflective of a more heterogeneous category. 
This analysis indicates that while correlations may be weak when looking at 
the global data, we tend to find higher correlations when looking at smaller 
well-defined subsets of ORFs. Further analysis is available at [33]. 
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Figure 2 

The differences in correlation between mRNA and protein expression 
values using novel categories. We see significant differences when looking 
at the highest and lowest ranking of groups of ORFs in the following 
categories: occupancy, CAI (codon adaptation index) value [45-47] and 
variability. Occupancy refers to the percentage of transcripts associated 
with ribosomes; we compared the correlation between the top !00 ORFs 
and the bottom 1 00 in terms of occupancy (r =0.78 versus 0.30). For the 
CAI, we compared the correlation between mRNA and protein for those 
ORFs with the highest CAI and those with the lowest (r = 0.48 versus 
0.02). Variability refers to the normalized standard deviation (that is, the 
standard deviation divided by the average expression level) for ail ORFs in 
the cell-cycle expression dataset of Cho et ai [38]. Here, we compared 
the correlations between protein abundance and mRNA expression for 
the most variable compared with the least variable proteins (r = 0.89 
versus 0,20). We found significant differences between the correlations of 
mRNA and protein levels for the top and bottom ranking populations for 
each of the comparisons. 



the rates of degradation or synthesis for a given protein, and 
there is significant heterogeneity even within proteins that have 
similar functions [41]. Recent efforts have been made to com- 
putationally measure these rates [42]. 

Simplistically, it can be presumed that the change in a pro- 
tein's concentration over time will be equal to the rate of 
translation minus the rate of degradation. By analogy to con- 
cepts in chemical kinetics, we can approximate this equa- 
tion: dP0*,f)/dr = SE(i,f) - DP(i t t\ where P is protein 
abundance i at time r, E is the mRNA expression level of 
protein P, S is a general rate of protein synthesis per mRNA, 
and D is a general rate of protein degradation per protein 
[43]. Additionally there are some experimental methods that 
can also be used to measure turnover and the translational 
control of protein levels [41-44]. 

Given the degenerate nature of the genetic code, there are 
many synonymous codons (codons that translate into the same 
amino acid). As the cell is biased in its usage of synonymous 
codons - that is, the usage of a subset of codons results in a 
higher level of mRNA expression, possibly as a result of 



differing cellular tRNA levels [45] - the codon adaptation index 
(CAI), a measurement of codon usage, can be used to predict 
the expression of a gene [46] (we recently calculated new para- 
meters for this model, with some improvement in predictive 
strength [47]). It is thought that the CAI will correlate differ- 
ently with mRNA levels than with protein abundance levels 
due, in part, to protein turnover rates [48]. Ranking the ORFs 
in terms of their CAI value, we found that although those ORFs 
that ranked the highest in terms of CAI did not show a very 
strong correlation between mRNA and protein levels, they nev- 
ertheless showed a significantly higher correlation than ORFs 
that were ranked as having the lower CAI values (r = 0.48 
versus 0.02). The low correlations reflect the fact that the CAI 
will correlate differently for protein and mRNA values because 
of the additional cellular controls on protein translation, 
namely the effect of protein turnover rates. Nevertheless, the 
sizable difference in correlations between the two groups of 
ORFs with high- and low-ranking CAI values (Figure 2) shows 
that there is some relationship between mRNA and protein 
values, possibly indicating that highly expressed genes tend to 
result in a more correlated level of protein abundance than 
lower expressed ones. 

Correlations have been found between the mRNA expression 
levels of different protein subunits within protein complexes 
[49]. This implies that there should be, in general, a correla- 
tion between mRNA and protein abundance, as these sub- 
units provide a special case as they have to be available in 
stoichiometric amounts of proteins for the complexes to func- 
tion. Thus, we believe that a major limitation to finding corre- 
lations is the degree of natural and manufactured systematic 
noise in mRNA and protein expression experiments. There is 
a continued effort to both describe and reduce this noise [50]. 
Meanwhile, in an attempt to get around the noise one could 
look at broad categories of proteins - for example, groups 
defined by function, structure, or localization - such that the 
background noise is cancelled out to some degree [29]. 

Although proteomics is still in its infancy, given the pace of 
technological advancement in protein quantification, mRNA 
expression analysis and noise reduction, more comprehensive 
correlation studies will soon be feasible. This will allow for 
more robust analyses of the relationship between mRNA 
expression and protein abundance values. Finally, to be fully 
able to understand the relationship between mRNA and 
protein abundances, the dynamic processes involved in protein 
synthesis and degradation have to be better understood; is the 
protein level changing because of a change in the rate of protein 
synthesis, or mRNA, or protein turnover? These questions 
need to be looked into further before we can appreciate in full 
the relationship between mRNA and protein abundance levels. 
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Discordant Protein and mRNA Expression in 
Lung Adenocarcinomas* 

Guoan Chenfc Tarek G. GharibJ, Chiang-Ching Huang§, Jeremy M. G. Taylor§, 
David E. MisekU Sharon L. R. Kardia||, Thomas J. Giordano**, Mark D. lannettonii, 
Mark B- Orringert, Samir M. Hanashll, and David G. Beert ft 



The relationship between gene expression measured at 
the mRNA level and the corresponding protein level is not 
well characterized in human cancer. In this study, we 
compared mRNA and protein expression for a cohort of 
genes in the same lung adenocarcinomas. The abun- 
dance of 165 protein spots representing 98 individual 
genes was analyzed in 76 lung adenocarcinomas and nine 
non-neoplastic lung tissues using two-dimensional poly- 
acrylamide gel electrophoresis. Specific polypeptides 
were identified using matrix-assisted laser desorption/ 
ionization mass spectrometry. For the same 85 samples, 
mRNA levels were determined using oligonucleotide mi- 
cro arrays, allowing a comparative analysis of mRNA and 
protein expression among the 165 protein spots. Twenty- 
eight of the 165 protein spots (17%) or 21 of 98 genes 
(21.4%) had a statistically significant correlation between 
protein and mRNA expression (r > 0.2445; p < 0.05); 
however, among all 165 proteins the correlation coeffi- 
cient values (r) ranged from -0.467 to 0.442. Correlation 
coefficient values were not related to protein abundance. 
Further, no significant correlation between mRNA and 
protein expression was found (r = -0.025) if the average 
levels of mRNA or protein among all samples were applied 
across the 165 protein spots (98 genes). The mRNA/ 
protein correlation coefficient also varied among pro- 
teins with multiple isoforms, indicating potentially sep- 
arate isoform-specif ic mechanisms for the regulation of 
protein abundance. Among the 21 genes with a signifi- 
cant correlation between mRNA and protein, five genes 
differed significantly between stage I and stage III lung 
adenocarcinomas. Using a quantitative analysis of mRNA 
and protein expression within the same lung adenocarci- 
nomas, we showed that only a subset of the proteins 
exhibited a significant correlation with mRNA abundance. 
Molecular & Cellular Proteomics 1:304-313, 2002. 



Lung cancer is the leading cause of cancer death for both 
men and women in the United States. Adenocarcinomas of 
the lung comprise -40% of all new cases of non-small cell 
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lung cancer and are now the most common histologic type. 
Functional genomics, broadly defined as the comprehensive 
analysis of genes and their products, have become a recent 
focus of the life sciences (1). Application of these approaches to 
lung adenocarcinomas has the potential to aid in the identifica- 
tion of high risk patients with resectable early stage lung cancer 
that may benefit from adjuvant therapy, as well as to identify 
new therapeutic targets. In human lung cancer, however, little is 
currently understood regarding the relationship between gene 
expression as determined by measuring mRNA levels and the 
corresponding abundance of the protein products. 

A number of powerful techniques for analysis of gene ex- 
pression have been used including differential display (2), 
serial analysis of gene expression (3), DNA microarrays (4), 
and proteomics via two-dimensional polyacrylamide gel elec- 
trophoresis and mass spectrometry (5). Bioinformatics toots 
have also been developed to help determine quantitative 
mRNA/protein expression profiles of all types of cells and 
tissues (6) and now can be applied to benign and malignant 
tumors. DNA microarrays (cDNA and oligonucleotide) permit 
the parallel assessment of thousands of genes and have been 
utilized in gene expression monitoring (7), polymorphism anal- 
ysis (8), and DNA sequencing (9). Recent studies have fo- 
cused on classification or identification of subgroups of lung 
tumors using DNA microarrays (10, 11). The use of mRNA 
expression patterns by themselves, however, is insufficient for 
understanding the expression of protein products, as addi- 
tional post-transcriptional mechanisms, including protein 
translation, post-translational modification, and degradation, 
may influence the level of a protein present in a given cell or 
tissue. Proteomic analyses, a complementary technology to 
DNA microarrays for monitoring gene expression, involves 
protein separation and quantitative assessment of protein 
spots using 2D 1 -PAGE and protein identification using mass 
spectrometry. By combining proteomic and transcriptional 
analyses of the same samples, however, it may be possible to 
understand the complex mechanisms influencing protein ex- 
pression in human cancer. 

In this study, we determined mRNA and protein levels for 
165 proteins (98 genes) in 76 lung adenocarcinomas and nine 



1 The abbreviations used are: 2D, two-dimensional; MALDI-MS, 
matrix-assisted laser desorption/ionization mass spectrometry. 
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Protein and mRNA Correlation in Lung Adenocarcinomas 



Table I 

Correlation coefficients of protein and mRNA where only one spot was present on 2D gels 
r\ correlation coefficient value > 0.2445; p < 0.05. Values in boldface are significant at p < 0.05. 



Spot Unigene Gene name r* Protein name 



1104 


Hs.1 84510 


SFN 


0.4337 


14-3-3 a 


0994 


Hs.77840 


ANXA4 


0.4219 


Annexin IV 


1314 


Hs.10958 


DJ-1 


0.3982 


DJ-1 protein/MER5 


1454 


Hs.75428 


SOD1 


0.3863 


Superoxide dismutase (Cu-Zn) 


1638 


Hs.227751 


LGALS1 


0.3318 


Galectin 1 


0264 


Hs. 129548 


HNRPK 


0.3034 


Transformation up-regulated nuclear protein 


1405 


Hs.1 11334 


FTL 


0.2849 


Ferritin light chain 


0963 


Hs.300711 


ANXA5 


0.2468 


Annexin V 


1252 


Hs.4745 


PSMC 


0.2445 


26 S proteasome p28 


0906 


Hs.234489 


LDHB 


0.4420 


L-lactate dehydrogenase H chain (LDH-B) 


1171 


Hs.241515 


COX11 


0.2310 


COX 11 


1160 


Hs.181013 


PGAM1 


0.2023 


Phosphoglycerate mutase 


0759 


Hs.74635 


DLD 


0.1965 


Dihydrolipoamide dehydrogenase precursor 


1193 


Hs.83383 


AOE372 


0.1932 


Antioxidant enzyme AOE372 


0172 


Hs.3069 


HSPA9B 


0.1872 


GRP75 


0777 


Hs.979 


PDHB 


0.1855 


Pyruvate dehydrogenase E1-/3 subunit precursor 


1249 


Hs.226795 


GSTP1 


0.1773 


Glutathione S-transferase pi (GST-pi) 


1685 


Hs.76136 


TXN 


0.1732 


Thioredoxin 


1205 


Hs.82314 


HPRT1 


0.1588 


HG phosphoribosyltransferase 


1230 


Hs.279860 


TPT1 


0.1466 


Translationally controlled tumor protein (TCTP) 


0603 


Hs.1 81 357 


LAMR1 


0.1463 


LAMR 


1358 


Hs.28914 


APRT 


0.1399 


Adenine phosphoribosyl transferase 


1.410 


Hs.82113 


DUT 


0.1213 


dUTP pyrophosphatase (dUTPase) 


1825 


Hs.1 12378 


LIMS1 


0.1213 


Pinch-2 protein 


0871 


Hs.250502 


CA8 


0.1122 


Carbonic anhydrase-related protein; Syntaxin 


0289 


Hs.82916 


CCT6A 


0.1106 


Chaperonin-like protein 


1143 


Hs.1 1465 


GSTTLp28 


0.0997 


Glutathione S-transferase homolog (GST homolog) 


1456 


Hs.1 18638 


NME1 


0.0932 


Nm23 (NDPKA) 


1598 


Hs.278503 


RIG 


0.0905 


RUG (U32331) 


1354 


Hs.89761 


ATP5D 


0.0904 


FIFO-type ATP synthase subunit d 


1445 


Hs.1 55485 


HIP2 


0.0843 


Huntingtin interacting protein 2 (HIP2) 


1479 


Hs.1 77486 


APP 


0.0746 


Amyloid B4A 


0608 


Hs.1 82265 


KRT19 


0.0439 


Cytokeratin 19 


1071 


Hs.1 0842 


RAN 


0.0277 


GTP-binding nuclear protein RAN(TC4) 


0991 


Hs.297939 


CTSB 


0.0254 


Cathepsin B 


0842 


Hs.77274 


PLAU 


0.0248 


Urokinase plasminogen activator 


0823 


Hs.198248 


B4GALT1 


0.0183 


j3 1 ,4-galactosyl transferase 


0613 


Hs.1247 


APOA4 


0.0176 


Apolipoprotein A4 (ApoA4) 


1338 


Hs.104143 


CLTA 


0.0123 


Clathrin light chain A 


0902 


Hs.5123 


SID6-306 


0.0117 


Cytosolic inorganic pyrophosphatase 


1688 


Hs.1 473 


GRP 


-0.0040 


Preprogastrin-releasing peptide 


0265 


Hs.274402 


HSPA1B 


-0.0071 


Heat shock-induced protein 


1414 


Hs.77541 


ARF5 


-0.0096 


ADP-ribosylation factor 1 


0710 


Hs.97206 


HIP1 


-0.0114 


Huntingtin interacting protein 1 (HIP1) 


0532 


Hs.1 70328 


MSN 


-0.0132 


Moesin/E 


0525 


Hs.284255 


ALPP 


-0.0148 


Alkaline phosphate, placental 


0513 


Hs.76901 


PDIR 


-0.0289 


Protein disulfide isomerase-related protein 5 


1659 


Hs.256697 


HINT 


-0.0312 


Protein kinase C inhibitor 


1262 


Hs.7016 


RAB7 


-0.0362 


Rab 7 protein 


0190 


Hs.1 84411 


ALB 


-0.0470 


Albumin 


0948 


Hs.2795 


LDHA 


-0.0549 


Lactate dehydrogenase-A (LDHA) 


0502 


Hs.1 80532 


GPI 


-0.0575 


Hsp89 


0152 


Hs.75410 


HSPA5 


-0.0640 


GRP78 


1054 


Hs.74276 


CLIC1 


-0.0686 


Nuclear chloride channel (RNCC protein) 


0709 


Hs.253495 


SFTPD 


-0.0936 


Pulmonary surfactant protein D 


0867 


Hs.78996 


PCNA 


-0.0982 


PCNA 


0165 


Hs.1 8041 4 


HSPA8 


-0.1014 


Heat shock cognate protein, 71 kDa 


1109 


Hs.75103 


YWHAZ 


-0.1018 


14-3-3 £/A 


0137 


Hs.554 


SSA2 


-0.1032 


Ro/ss-A antigen 
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Table I — continued 



Spot Unigene Gene name f Protein name 



0278 Hs.4112 TCP1 -0.1237 T-complex protein I, a subunit 

1769 Hs.9614 NPM1 -0.1738 B23/numatrin 

0089 Hs.74335 HSPCB -0.2049 Hsp90 

2511 Hs.153179 FABP5 -0.2109 E-FABP/FABP5 

1739 Hs.16488 CALR -0.2344 Calreticulin 32 

1 1 38 Hs.301 961 GSTM4 -0.2438 Glutathione S-transferase M4 (GST m4) 

2533 Hs.77060 PSMB6 -0.2512 Macropain subunit A 



non-neoplastic lung tissues. Protein levels were determined 
using quantitative 2D- PAGE analysis, and the separated pro- 
tein polypeptides were identified using matrix-assisted laser 
desorption/ionization mass spectrometry (MALDI-MS). The 
corresponding mRNA levels for the identified proteins within 
the same samples were determined using oligonucleotide 
microarrays. Correlation analyses showed that protein abun- 
dance is likely a reflection of the transcription for a subset of 
proteins, but translation and post-translationat modifications 
also appear to influence the expression levels of many indi- 
vidual proteins in lung adenocarcinomas. 

EXPERIMENTAL PROCEDURES 

Tissues— Fifty-seven stage I and 19 stage III lung adenocarcino- 
mas, as well as nine non-neoplastic lung tissue samples, were used 
for protein and mRNA analyses. Patient consent was obtained, and 
the project was approved by the Institutional Review Board. All tisr 
sues were obtained after resection at the University of Michigan 
Health System between May 1991 and July 1998. Tissues were ail 
snap-frozen in liquid nitrogen and then stored at -80 °C. The patients 
included 46 females and 30 males ranging in age from 40.9 to 84.6 
(average 63.8) years. Most patients (66/76) demonstrated a positive 
smoking history. Sixty-one tumor samples were classified as bron- 
chial-derived, 14 were classified as bronchoalveolar, and one had 
both features. Eighteen tumor samples were classified as well differ- 
entiated, 38 were classified as moderate, and 19 were classified as 
poorly differentiated adenocarcinomas. Hematoxylin-stained cryostat 
sections (5 /xm), prepared from the same tumor pieces to be utilized 
for protein and mRNA isolation, were evaluated by a pathologist and 
compared with hematoxylin- and eostn-stained sections made from 
paraffin blocks of the same tumors. Specimens were excluded from 
analysis if they showed unclear or mixed histology {e.g. adenosqua- 
mous), tumor cellularity less than 70%, potential metastatic origin as 
indicated by previous tumor history, extensive lymphocytic infiltration, 
or fibrosis or if the patient had received prior chemotherapy or 
radiotherapy. 

Oiigonucieotide Array Hybridization —The HuGeneFL oligonucleo- 
tide arrays (Affymetrix, Santa Clara, CA) containing 6800 genes were 
used in this study. Total RNA was isolated from all samples using 
Trizol reagent (Invitrogen). The resulting RNA was then subjected to 
further purification using RNeasy spin columns (Qiagen). Preparation 
of cRNA, hybridization, and scanning of the HuGeneFL arrays were 
performed according to the manufacturer's protocol (Affymetrix, 
Santa Clara, CA). Data analysis was performed using GeneChip 4.0 
software. The gene expression profile of each tumor was normalized 
to the median gene expression profile for the entire sample. Details of 
data trimming and normalization are described elsewhere (11). 

2D-PAGE and Quantitative Protein Analysis— Tissue for both pro- 
tein and mRNA isolation came from contiguous areas of each sample. 
Protein separation using 2D-PAGE, silver staining, and digitization 



were performed as described previously (12, 13).. Our 2D- PAGE sys- 
tem allows us to run 20 gels at one time (one batch). Spot detection 
and quantification were accomplished utilizing Bio Image Visage Sys- 
tem software (Bioimage Corp., Ann Arbor, Ml). The integrated inten- 
sity of each spot was calculated as the measured optical density 
units x mm 2 . Of the total possible 2000 spots detectable on each gel, 
820 spots on the gel of each sample were matched using a Gel-ed 
match program with the same spots on a chosen "master" gel. In 
each sample, 250 ubiquitously expressed reference spots were used 
to adjust for variations between gels, such as that created by subtle 
differences in protein loading or gel staining. Slight differences be- 
cause of batch were corrected after spot-size quantification. 

Mass Spectrometry and 2D Western Blotting— Preparative 2D gels 
were run using extracts from A549 lung adenocarcinoma cells (ob- 
tained from ATCC) and using the identical experimental conditions as 
the analytical 2D gels, except 30% more protein was loaded. The 
resolved protein gels were silver-stained using successive incuba- 
tions in 0.02% sodium thiosulfate for 2 min, 0.1 % silver nitrate for 40 
min, and 0.014% formaldehyde plus 2% sodium carbonate for 10 
min. For protein identification, protein polypeptides underwent trypsin 
digestion followed by MALDI-MS using a MALDI-TOF Voyager-DE 
mass spectrometer (Perseptive Biosystems, Framingham, MA). The 
masses were compared with known trypsin digest databases using 
the MS-FIT database (University of California, San Francisco; 
prospector.ucsf.edu/ucsfhtml3.2/msfit.htm). Some of the polypep- 
tides included in the analysis had been identified prior to this study on 
the basis of sequencing (14). The identified protein spots used in this 
paper are shown in Fig. 1A The method for 2D- PAGE Western blot 
verification was as described previously (15). The 2D Western blots of 
GRP58 and Op1 8 are shown in Fig. 1 , C and E; the others, such as 
GRP78, GRP75, HSP70, HSC70, KRT8, KRT18, KRT19, Vimentin, 
ApoJ, 14-3-3, Annexin I, Annexin II, PGP9.5, DJ-1, GST-pi, and 
PGAM, are described elsewhere. 2 

Statistical Analysis— Missing values were replaced with the mean 
value of the protein spot. The transform x -» log (1 + x) was applied 
to normalize all protein expression values. The relationship between 
protein and mRNA expression levels within the same samples was 
examined using the Spearman correlation coefficient analysis (16). To 
identify potentially significant correlations between gene and protein 
expression, we used an analytical strategy similar to SAM (signifi- 
cance analysis of microarrays) (1 7), which uses a permutation tech- 
nique to determine the significance of changes in gene expression 
between different biological states. To obtain permuted correlation 
coefficients between gene and protein expression, genes were ex- 
changed first in such a way that permutated correlation coefficient 
were calculated based on pseudo pairs of genes and proteins. The 
distribution of permutated correlation coefficients became stable after 
60 permutations. This procedure was then repeated 60 times to 
obtain 60 sets of permutated correlation coefficients. For each of the 
60 permutations, the correlations of genes and proteins were ranked 



2 Chen ef a/., submitted for publication. 
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Table II 

Correlation coefficients of protein and mRNA where multiple isoforms were present on 2D gels 
r*. correlation coefficient value > 0.2445; p < 0.05. Values in boldface are significant at p < 0.05. 



Spot Unigene . Gene name r* Protein name 



1494 


Hs.81915 


LAP18 


0.4003 


OP18 (Stathmin) 


0957 


Hs.77899 


TPM1 


0.3930 


Tropomyosins 1-5 


0353 


Hs.289101 


GRP58 


0.3802 


Protease disulfide isomerase (GRP58) 


0855 


Hs. 169476 


GAPD 


0.3693 


Glyceraldehyde-3-phosphate dehydrogenase 


11 98 


Hs.41707 


HSPB3 


0.3668 


Hsp27 


1203 


Hs.83848 


TPI1 


0.3395 


Triose phosphate isomerase (TP\) 


0523 


Hs.65114 


KRT18 


0.3335 


Cytokeratin 18 


1492 


Hs.81915 


LAP18 


0.3234 


OP1 8 (Stathmin) 


1493 


Hs 81915 


LAP18 


0.3154 


OP18 (Stathmin) 


1 181 


Hs.78225 


ANXA1 


0.3102 


Annexin variant I 


0439 


Hs.242463 


KRT8 


0.3049 


Cytokeratin 8 


0505 


Hs.297753 


VIM 


0.2939 


Vimentin 


0593 


Hs.297753 


VIM 


0.2809 


Vimentin 


1874 


Hs.75313 


AKR1B1 


0.2790 


Aldose reductase 


0935 


Hs.75544 


YWHAH 


0.2775 


14-3-3 7) 


2524 


Hs.78225 


ANXA1 


0.2612 


Annexin I 


2324 


Hs.65114 


KRT18 


0.2601 


Cytokeratin 18 


1192 


Hs.41707 


HSPB3 


0.2558 


Hsp27 


0350 


Hs.289101 


GRP58 


0.2516 


Phospholipase C (GRP58) 


0992 


Hs.75313 


AKR1B1 


-0.2460 


Aldose reductase 


0861 


Hs.75313 


AKR1B1 


0.0761 


Aldose reductase 


0853 


Hs.75313 


AKR1B1 


-0.0675 


Aldose reductase 


2503 


Hs.76392 


ALDH1 


-0.0565 


Aldehyde dehydrogenase 


0381 


Hs. 76392 


ALDH1 


-0.0371 


Aldehyde dehydrogenase 


0371 


Hs.76392 


ALDH1 


-0.0680 


Aldehyde dehydrogenase 


1 179 


Hs.78225 


ANXA1 


0.2052 


Annexin variant I 


0762 


Hs.78225 


ANXA1 


-0.0739 


Annexin I 


0760 


Hs.78225 


ANXA1 


-0.0228 


Annexin I 


2506 


Hs.217493 


ANXA2 


0.2223 


Lipocotin (annexin II) 


0772 


Hs.217493 


ANXA2 


0.2080 


Lipocotin (annexin II) 


0723 


Hs.217493 


ANXA2 


0.0701 


Lipocotin 


1239 


Hs.93194 


APOA1 


0.1133 


Apolipoprotein A1 (ApoA1) 


1237 


Hs.93194 


APOA1 


-0.0373 


Apolipoprotein A1 (ApoA1) 


1234 


Hs.93194 


APOA1 


-0.0894 


Apolipoprotein A1 (ApoA1) 


0428 


Hs.25 


ATP5B 


0.0080 


ATP synthase 0 subunit precursor 


0427 


Hs.25 


ATP5B 


0.0122 


ATP synthase /3 subunit precursor 


0424 


Hs.25 


ATP5B 


-0.0992 


ATP synthase J3 subunit precursor 


0863 


Hs.75106 


CLU 


-0.0483 


Apolipoprotein J (ApoJ) 


0780 


Hs.75106 


CLU 


-0.0443 


Apolipoprotein J (ApoJ) 


1527 


Hs.1 19140 


EIF5A 


-0.0726 


elF-5A 


1484 


Hs. 119140 


EIF5A 


-0.0376 


elF-5A 


1728 


Hs.5241 


FABP1 


-0.1916 


L-FABP 


1712 


Hs.5241 


FABP1 


-0.0473 


L-FABP 


0947 


Hs. 169476 


GAPD 


0.1745 


Glyceraldehyde-3-phosphate dehydrogenase 


1232 


Hs.75207 


GL01 


0.2249 


Glyoxalase-I 


1229 


Hs.75207 


GL01 


0.0450 


Glyoxalase-1 


1595 


Hs. 158300 


HAP1 


-0.0137 


Huntingtin-associated protein 1 (neuroan 1) 


1810 


Hs.75990 


HP 


-0.4672 


a-Haptoglobin 


1459 


Hs.75990 


HP 


0.0802 


a-Haptoglobin 


1458 


Hs.75990 


HP 


-0.0305 


a-Haptoglobin 


0619 


Hs.75990 


HP 


0.0461 


B-haptoglobin 


0615 


Hs.75990 


HP 


-0.0034 


B-haptoglobin 


1250 


Hs.41707 


HSPB3 


-0.1024 


Hsp27 


0549 


Hs.79037 


HSPD1 


0.1074 


Hsp60 


0338 


Hs.79037 


HSPD1 


0.2265 


Hsp60 


0333 


Hs.79037 


HSPD1 


0.1383 


Hsp60 


0331 


Hs.79037 


HSPD1 


0.1603 


Hsp60 


2381 


Hs.65114 


KRT18 


0.2016 


Cytokeratin 18 


0535 


Hs.65114 


KRT18 


0.1106 


Cytokeratin 18 
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Table II — continued 

Correlation coefficients of protein and mRNA where multiple isoforms were present on 2D gels 
r*, correlation coefficient value > 0.2445; p < 0.05. Values in boldface are significant at p < 0.05. 



bpot 




O CI ItS MdlllO 


r* 


Protein name 


0529 


Hs.65114 


KRT18 


0.1279 


Cytokeratin 18 


0528 


Hs.65114 


KRT18 


0.0414 


Cytokeratin 18 


0527 


Hs.65114 


KRT18 


0.0436 


Cytokeratin 18 


0514 


Hs.65114 


KRT18 


0.0733 


Cytokeratin 18 


0451 


Hs.242463 


KRT8 


-0.0111 


Cytokeratin 8 


0446 


Hs.242463 


KRT8 


0.0347 


Cytokeratin 8 


0444 


Hs.242463 


KRT8 


-0.1311 


Cytokeratin 8 


0443 


Hs.242463 


KRT8 


0.0942 


Cytokeratin 8 


1488 


Hs.81915 


LAP18 


0.0495 


OP18 (Stathmin) 


0321 


Hs.75655 


P4HB 


-0.0546 


PDI (proly-4-OH-B) 


0320 


Hs.75655 


P4HB 


-0.0041 


PDI (proly-4-OH-B) 


1063 


Hs.75323 


PHB 


0.0441 


Prohibitin 


0837 


Hs.75323 


PHB 


0.1402 


Prohibitin 


0326 


Hs.297681 


SERPINA1 


-0.0227 


a-1-Antitripsin 


0322 


Hs.297681 


SERPINA1 


-0.0277 


a-1-Antitripsin 


0241 


Hs.297681 


SERPINA1 


-0.0148 


a-1-Antitripsin 


1280 


Hs.301254 


SFTPA1 


-0.1488 


Pulmonary surfactant-associated protein 


1278 


Hs.301254 


SFTPA1 


-0.2040 


Pulmonary surfactant-associated protein 


0866 


Hs.73980 


TNNT1 


0.1162 


Troponin T 


0778 


Hs.73980 


TNNT1 


0.0740 


Troponin T 


1213 


Hs.83848 


TPI1 


0.0024 


Triose phosphate isomerase (TPI) 


1210 


Hs.83848 


TPI1 


0.0490 


Triose phosphate isomerase (TPI) 


1207 


Hs.83848 


TPI1 


-0.1615 


Triose phosphate isomerase (TPI) 


1204 


Hs.83848 


TPI1 


0.0209 


Triose phosphate isomerase (TPI) 


1202 


Hs.83848 


TPI1 


0.0721 


Triose phosphate isomerase (TPI) 


1161 


Hs.83848 


TPI1 


0.2265 


Triose phosphate isomerase (TPI) 


1052 


Hs.77899 


TPM1 


-0.1040 


Tropomysin clean-product 


1039 


Hs.77899 


TPM1 


-0.2999 


Cytoskeletal tropomyosin 


1035 


Hs.77899 


TPM1 


-0.3821 


Tropomyosin 


0783 


Hs.77899 


TPM1 


0.0757 


Tropomyosins 1-5 


1574 


Hs. 194366 


TTR 


-0.0065 


Transthyretin 


0809 


Hs. 194366 


TTR 


0.0399 


Transthyretin multimere 


2202 


Hs.76118 


UCHL1 


-0.0220 


Ubiquitin carboxyl-terminal hydrolase isozyme L1 


1246 


Hs.76118 


UCHL1 


-0.1261 


Ubiquitin carboxyl-terminal hydrolase isozyme L1 


1242 


Hs.76118 


UCHL1 


0.1473 


Ubiquitin carboxyl-terminal hydrolase isozyme L1 


0606 


Hs.297753 


VIM 


0.0951 


Vimentin 


0594 


Hs.297753 


VIM 


-0.2664 


Vimentin-derived protein (vid4) 


0508 


Hs.297753 


VIM 


0.1008 


Vimentin-derived protein (vid2) 


0419 


Hs.297753 


VIM 


0.0032 


Vimentin-derived protein (vid1) 


1279 


Hs.75544 


YWHAH 


0.0059 


14-3-3 77 



such that p p (i) denotes the /th largest correlation coefficient for pth 
permutation. Hence, the expected correlation coefficient, p^j), was the 
average over the 60 permutations, p^i) = 2£° = , p p (/)/60. A scatter plot of 
observed correlations (p(/)) versus the expected correlations is shown in 
Fig. 2D. For this study, we chose threshold A = 0.1 1 5 so that correlation 
would be considered significant if absolute value of difference between 
p{t) and p^i) was greater than the threshold. Twenty-nine (including one 
with observed correlation coefficient -0.4672) of 165 pairs of gene and 
protein expression were called significant in such criteria, and the 
permuted data generated an average of 5.1 falsely significant pairs of 
gene and protein expression. This provided an estimated false dis- 
covery rate (the percentage of pairs of gene and protein expression 
identified by chance) for our data set. 

RESULTS 

Correlation of Individual Proteins and mRNA Expression 
within Each Tumor— We have examined quantitatively 165 



protein spots on 2D gels representing 98 genes and com- 
pared protein levels with mRNA levels for a cohort of 85 lung 
adenocarcinomas and normal lung samples. Of the 1 65 pro- 
tein spots, 69 proteins were represented by only one known 
spot on 2D gels for an individual gene, whereas 96 protein 
spots showed multiple protein products from 29 different 
genes. 2D Western blotting verified the proteins identified by 
mass spectrometry when specific antibodies were available. 
Spearman correlation coefficients of the proteins and their 
associated mRNA for each protein spot were generated using 
all 76 lung adenocarcinomas and nine non-neoplastic lung 
tissues (see Tables I and II, and see Figs. 1 and 2). The 
correlation coefficients (r) ranged from -0.467 to 0.442 (Fig. 
2D). A total of 28 protein spots {21 genes) were found to have 
a statistically significant correlation between expression of 
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Fig. 1. A digital image of a silver-stained 2D-PAGE separation of a stage I lung adenocarcinoma showing protein spots separated by 
molecular mass {MW) and isoelectric point (PI). Twenty-eight protein spots whose expression levels are correlated with mRNA abundance are 
indicated by the black arrows. B, the outlined areas of A showing protein GRP58. C,.2D Western blot of GRP58 from the A549 lung 
adenocarcinoma cell line. D, the outlined areas of A showing the protein isoforms of Op18. E, 20 Western blot of Op18 from A549 cells. 



their protein and mRNA (r > 0.2445; p < 0.05). This accounts 
for 17% (28/165) of the 165 protein spots. Among the 69 
genes for which only a single protein spot was known (Table 
I), nine genes (9/69, 13%) were observed to show a statisti- 
cally significant relationship between protein and mRNA 
abundance (r > 0.2445; p < 0.05). The proteins whose ex- 
pression levels were correlated with their mRNA abundance 
included those involved in signal transduction, carbohydrate 
metabolism, apoptosis, protein post-translational modifica- 
tion, structural proteins, and heat shock proteins (Table III). 

Individual Isoforms of the Same Protein Have Different 
Protein/mRNA Correlation Coefficients— Of the 165 protein 
spots, 96 represent protein products of 29 genes with at least 
two isoforms. Among these 96 protein spots, 19 (19/96 pro- 
tein spots, 20%) showed a statistically significant correlation 
between their protein and mRNA expression (r > 0.2445; p < 
0.05) (Table II) and represented 1 2 genes (12/29, 41%). Individ- 
ual isoforms of the same protein demonstrated different 
protein/mRNA correlation coefficients. For example, 2D-PAGE/ 
Western analysis revealed four isoforms of OP18 differing in 
regards to isoelectric point but similar in molecular weight. 
Three of the four isoforms (spots 1 492, 1 493, and 1 494) showed 
a statistically significant correlation between their protein and 
mRNA abundance (r = 0.3234, 0.3154, and 0.4003, respective- 
ly). The forth isoform (spot 1488) showed no correlation be- 



tween protein and mRNA expression (r = 0.0495). Similarly, just 
one of five quantified isoforms of cytokeratin 8 (spot 439) dem- 
onstrated a statistically significant correlation between protein 
and mRNA abundance (r = 0.3049; p < 0.05) (Table II). 

In addition to differences in the relationship between mRNA 
levels and protein expression among separate isoforms, some 
genes with very comparable mRNA levels showed a 24-fold 
difference in their protein expression. Genes with comparable 
protein expression levels also showed up to a 28-fold vari- 
ance in their mRNA levels. 

Lack of Correlation for mRNA and Protein Expression when 
Using Average Tumor Values across All 165 Protein Spots (98 
Genes)— The relationship between mRNA and protein expres- 
sion was also examined by using the average expression 
values for all samples. To analyze this relationship using this 
approach, the average value for each protein or mRNA was 
generated using all 85 lung tissue samples. The range of 
normalized average protein values ranged from -0.0646 to 
0.0979 (raw value 0.0036 to 4.1947), and the range for mRNA 
was from 0 to 1 5260.5 for all 1 65 individual protein spots. The 
Spearman correlation coefficient for the whole data set (165 
protein spots/98 genes) was -0.025 (Fig. 3A). Even for the 28 
protein spots (Fig. 2D) that were found to have a statistically 
significant correlation between their mRNA and protein, use of 
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Fig. 2. A-C, plots showing the correlation between mRNA and protein for the three selected genes Op18, Annexin IV, and GAPD for all 76 
lung adenocarcinomas and nine non-neoplastic lung samples (p < 0.05). D, distribution of all 165 Spearman correlation coefficients (r) and 
verification analysis using SAM. A more detailed description of the method is provided under "Experimental Procedures." Approximately 17% 
of the 165 proteins demonstrate a significant correlation between mRNA and protein levels as demonstrated by the values shown beyond the 
outer range of threshold A = 0.1 15. Normalized protein values were used, thus negative values for some proteins are observed. 



the average value resulted in a correlation coefficient value of 
-0.035, which was not significant (Fig. 3S). 

Lack of a Relationship between Protein/mRNA Correlation 
Coefficients and Average Protein Abundance— To determine 
whether an absolute protein level might influence the corre- 
lation with mRNA, the mean value* of each protein (relative 
abundance) and the Spearman protein/mRNA correlation co- 
efficients among ail 85 samples were examined. No relation- 
ship between the protein abundance and the correlation co- 
efficients was observed (r = 0.039; p > 0.05). A detailed 
analysis of separate subsets of proteins with differing levels of 
abundance (less than -0.0014, larger than -0,0014, or larger 
than 0.0077) also showed a lack of correlation between mRNA 
and protein expression among the 83 (50%), 82 (50%), and 41 
(25%) of 165 total protein spots, respectively (r = 0.016, 0.08, 
and 0.172, respectively). 

Stage-related Changes in the Protein/mRNA Correlation 
Coefficients— To determine whether the 21 genes (28 protein 
spots) showing a significant correlation between the protein 
and mRNA expression among all samples demonstrate 
changes in this relationship during tumor progression, the 
correlations were examined separately for stage I (n - 57) and 



stage III (n - 19) lung adenocarcinomas (Table III). The num- 
ber of non-neoplastic lung samples (n = 9) was insufficient for 
a separate correlation analysis of this group. Many of the 
protein spots represent one of several known protein isoforms 
for a given gene. The majority of genes (16/21) did not differ in 
the protein/mRNA correlation between stage I and stage III 
tumors indicating a similar regulatory relationship between the 
mRNA and protein spot. GRP-58, PSMC, SOD1, TPI1, and 
VIM, however, were found to demonstrate significant differ- 
ences in the correlation coefficients between stage I and 
stage III lung adenocarcinomas. For GRP-58, PSMC, and VIM 
the change in the correlation coefficient was because of a 
relative increase in protein expression in stage III tumors. For 
SOD and TPI the change resulted from a relative decrease in 
expression of this specific protein in stage III tumors. 

DISCUSSION 

Relatively little is known about the regulatory mechanisms 
controlling the complex patterns of protein abundance and 
post-translational modification in tumors. Most reports con- 
cerning the regulation of protein translation have focused on 
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Table III 

Stage-dependent analysis of protein-mRNA correlation coefficients 
r, correlation coefficient. Values in boldface indicate a significant difference between stage I and stage III. 



Spot 


Gene name 


x (oiage \) 


r ^oiage iiij 


1874 


AKR1B1 


0.269 


0.106 


2524 


ANXA1 


0.184 


0.572 


0994 


ANXA4 


0.660 


0.362 


0963 


ANXA5 


0.241 


0.390 


1314 


DJ-1 


0.363 


0.354 


1405 


FTL 


0.126 


0.358 


0855 


GAPD 


0.243 


0.581 


0350 


GRP58 


0.327 


-0.087 


0264 


HNRPK 


0.360 


0,243 


1192 


HSPB3 


0.457 


0.633 


0523 


KRT18 


0.115 


0.371 


0439 


KRT8 


0.323 


0.436 


1492 


LAP18 


0.483 


0.663 


1638 


LGALS1 


0.200 


0.528 


1252 


PSMC 


0.253 


0.060 


1104 


SFN 


0.465 


0.475 


1454 


SOD1 


0.352 


0.079 


1203 


TPI1 


0.378 


0.009 


0957 


TPM1 


0.475. 


0.225 


0593 


. VIM 


-0.054 


0.556 


0935 


YWHAH 


0.283 


0.210 



one or several protein products (18). Celis et al. (19) found a 
good correlation between transcript and protein levels among 
40 well resolved, abundant proteins using a proteomic and 
microarray study of bladder cancer. By comparing the mRNA 
and protein expression levels within the same tumor samples, 
we found that 1 7% (28/165) of the protein spots (21/98 genes) 
show a statistically significant correlation between mRNA and 
protein. These proteins appear to represent a diverse group of 
gene products and include those involved in signal transduc- 
tion, carbohydrate metabolism, protein modification, cell struc- 
ture, heat shock, and apoptosis. These results suggest that 
expression of this subset of 1 65 proteins is likely to be regulated 
at the transcriptional level in these tissues. The majority of the 
protein isoforms, however, did not correlate with mRNA levels, 
and thus their expression is regulated by other mechanisms. We 
also observed a subset of proteins that demonstrated a nega- 
tive correlation with the mRNA expression values; for example 
a-haptoglobin demonstrated a strong negative correlation with 
its mRNA expression values. This may reflect negative feedback 
on the mRNA or the protein or the presence of other regulatory 
influences that are not understood currently. 

Post-translational modification or processing will result in 
individual protein products of the same gene migrating to 
different locations on 2D-PAGE gels (20). Because the identity 
of all possible isoforms for each protein examined has not 
been characterized completely, this may influence the corre- 
lation analyses performed in this study. This is partly because 
of limitations of the 2D-PAGE and mass spectrometry tech- 
nologies (21, 22). Potential inconsistencies between mRNA 
and protein correlations that have been reported may also be 
because of differences, even in the same gene, in the mech- 



Function 

Carbohydrate metabolism; electron transporter 
Phospholipase inhibitor; signal transduction 
Phospholipase inhibitor 

Phospholipase inhibitor; calcium binding; phospholipid binding 
Signal transduction 
Iron storage protein 

Carbohydrate metabolism (glycolysis regulation) 
Signal transduction; protein disulfide isomerase 
RNA-binding protein (RNA processing/modification) 
Heat shock protein 
Structural protein 
Structural protein 

Signal transduction; cell growth and maintenance 
Apoptosis; cell adhesion; cell size control 
Protein degradation 

Signal transduction (protein kinase C inhibitor) 
Oxidoreductase 
Carbohydrate metabolism 
Structural protein (muscle); control of heart 
Structural protein 
~* Signal transduction 



anisms of protein translation among different cells or as 
measured in different laboratories (23). 

In this study, we examined 165 protein spots identified in 
lung adenocarcinomas. Ninety-six protein spots, representing 
the products of 29 genes, contained at least two protein 
isoforms. Nineteen of 96 protein spots, representing 12 
genes, were shown to have a statistically significant correla- 
tion between their protein and mRNA expression, suggesting 
that the levels of these proteins reflects the transcription of the 
corresponding genes. Differences in protein/mRNA correlations 
were found among the individual isoforms of a given protein. For 
example, of the four OP1 8 isoforms, three showed a statistically 
significant correlation between the protein and mRNA expres- 
sion levels. The lack of relationship for the one isoform, how- 
ever, indicates that individual protein isoforms of the same gene 
product can be regulated differentially. This is not unexpected 
and likely reflects other post-translational mechanisms that can 
influence isoform abundance in tissues and cancer. 

In addition to the analyses of the correlation of mRNA/ 
protein within the same tumor samples, we also tested the 
global relationship between mRNA and the corresponding 
protein abundance across all 165 protein spots in the lung 
samples. A protein and mRNA average value for each gene 
was generated using all 85 lung tissues samples. We ob- 
served a very wide range of normalized average protein and 
mRNA values. The correlation coefficient generated using this 
average value data set was -0.025, and even for the 28 
protein spots that showed a statistically significant correlation 
between individual mRNA and proteins, the correlation value 
was only -0.035. This suggests that it is not possible to 
predict overall protein expression levels based on average 
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Fig. 3. The overall correlation of 
mRNA and protein levels across all 
165 protein spots {A) and across 28 
protein spots that contained individ- 
ual r values larger than 0.244 (0) are 
shown. Each protein or mRNA mean 
value was calculated based on all 76 
lung adenocarcinomas and nine non- 
neoplastic lung samples using quantita- 
tive 2D-PAGE and Affymetrix oligonu- 
cleotide microarrays. The Spearman 
correlation coefficients for the two data 
sets {A and B) were -0.025 and -0.035, 
respectively, indicating a lack of correla- 
tion if mean values for mRNA and protein 
for all samples is used. 
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mRNA abundance in lung cancer samples. This conclusion is 
also supported by previous results from Anderson and Seil- 
hamer (24), who examined 19 genes in human liver cells, and 
by Gygi et al. (25), who examined 106 genes in yeast. Both 
studies found a lack of correlation between mRNA and protein 
expression when average or overall levels were used. 

A good correlation was reported when the 1 1 most abun- 
dant proteins were examined in yeast (25), suggesting that the 
level of protein abundance may be a factor that may influence 
the correlation between mRNA and protein. In the present 
study, a fairly wide range of mean protein values among 165 
protein spots in lung adenocarcinomas was observed, and 
the correlation coefficients also varied from -0.467 to 0.442. 



A comparison between the mean value of each protein and 
the correlation coefficient generated using all 85 tissue sam- 
ples did not reveal a strong relationship between the overall 
protein abundance and the correlation coefficients (r = 0.039; 
p > 0.05). Detailed analysis of different subsets of protein abun- 
dance also failed to show a con-elation between mRNA and 
protein expression. Thus in contrast to yeast, a relationship 
between mRNA/protein correlation coefficient and protein 
abundance in human lung adenocarcinomas was not observed. 

The results of this study indicate that the level of protein 
abundance in lung adenocarcinomas is associated with the 
corresponding levels of mRNA in 17% (28 proteins) of the 
total 165 protein spots examined. This was substantially 
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higher than the amount predicted to result by chance alone 
(which was 5.1) and suggests that a transcriptional mecha- 
nism likely underlies the abundance of these proteins in lung 
adenocarcinomas. We also demonstrate that the expression 
of individual isoforms of the same protein may or may not 
correlate with the mRNA, indicating that separate and likely 
post-translational mechanisms account for the regulation of 
isoform abundance. These mechanisms may also account for 
the differences in the correlation coefficients observed between 
stage I and stage III tumors, indicating that specific protein 
isoforms show regulatory changes during tumor progression. 
Further studies in lung adenocarcinomas wili examine the rela- 
tionship between the expression of individual protein isoforms 
and specific clinical-pathological features of these tumors, such 
as the presence of angiolymphatic invasion, and nodal or pleu- 
ral surface involvement. The potential to identify specific protein 
isoforms associated with biological behavior in lung adenocar- 
cinomas would be of considerable interest and will add to our 
understanding of the regulation of gene products by transcrip- 
tional, translational, and post-translational mechanisms. 
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(or perhaps was exacerbated by) a UK 
government seen to be welcoming of GM 
foods and crops. Another negative was that 
it was major transnational corporations — 
another questionable community in the 
eyes of much of the public here — that 
were seeking to push their new products 
onto the public without previous debate 
and without there being any perceptible 
benefit. And finally, the potentially negative 
impact of GM crops on organic farmers — 
who are seen by some as crucially 
important for the sustainable future of 
food production — and the relatively small 
scale of agricultural production in the 
United Kingdom (and Europe) have also 
been important issues. 

The question to be answered, therefore, 
is not how to force the EU to accept GM 
foods and crops against its own public 
opinion, but how to change public opinion 
in the EU. The UK government is currently 
conducting several exercises that it hopes 
will provide the facts to support a relaxation 
of the moratorium on growing GM crops. 
These include a major review of the costs 
and benefits of GM crops (just finished), 
a scientific review of the issues (also now 
finished )> a series of crop trials (results in 
September) and a public debate on GM 
crops, 'GM nation' (just finished). 

Whether these will change attitudes is 
moot: the costs-and-benefits review has 
concluded that the economic value of the 
few currently available GM crops that could 
be grown in the UK is likely to be limited 
because of negative consumer attitudes to 
GM foods. 



is that risk relates to the environment and 
human health. On the other hand, recent 
studies have repeatedly shown that public 
hesitance also includes a number of ethical 
issues (e.g., market dominance of a few large 
companies and GM crops threatening 
natural or divine orders, refs 
1 ,2). Our worry is that the 
US government is 
neglecting widespread 
concerns of the European 
public that include more 
than environmental risk 
and human health. 

Research carried out by 
our group in Denmark 1 
indicates that, although 
many people are confident 
that the public authorities 
are able to manage the risks 
here and now, people are less confident about 
their ability to handle long-term effects 
because of the scientific uncertainty. 
Attempts to conceal these or other limits to 
scientific knowledge do not prevent 
controversies from arising; rather, the 
opposite happens because trust in business, 
scientific experts and public authorities is 
undermined (witness the handling of the 
BSE controversy in the United Kingdom). 

In the long run, a policy of openness about 
the different dimensions of uncertainty 
would be more likely to increase trust in 
scientific risk assessment. Of course, this 
will not guarantee public acceptance of GM 
food, but experience in Europe shows that 
transparency and dialog are prerequisites for 
decreasing concerns about new technology. 




The argument that the EU's resistance 
to GM food has had negative consequences ' 
for developing countries, denying them 
access. to a technology that could alleviate 
food provision, is regarded sympathetically 
by many among the European public. 

Indeed, here most people 
abandon the simple 
dichotomy between 
'unacceptable' GM food and 
the much more acceptable 
medical uses. This is because 
GM foods 

are seen as a means to help 
people in distress. Many 
counter such humanitarian 
uses, however, by the 
observation that, in general, 
GM crops are developed not 
to benefit people in the 
developing world, but to make money. 
Needless to say, according to those who 
point this out, making money is not in 
itself an acceptable objective. Thus, the fear 
is that the benefits will never accrue to 
those who are at present suffering. 

Kristian Borch, Jesper Lassen 
& Rikke B Jergensen 

Centre for Bioethics and Risk Assessment, 
Systems Analysis Depa rtmen t, 
PO Box 49, DK-4000 Roskilde, Denmark 
e- mail: kristian. borch@risoe. dk 
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To the editor: 

Several articles in the July and August issues 
of Nature Biotechnology (21, 735-738, 2003; 
21, 852-854, 2003) discuss whether the US 
strategy of forcing the European Union 
(EU; Brussels, Belgium) to accept GM foods 
by referring to World Trade Organisation 
(WTO; Geneva, Switzerland) rules will bear 
fruit. We do not believe so — rather the 
opposite. 

A central claim in the arguments of 
both President Bush and US commerce 
representative Robert B. Zoellick is that the 
risk of GM foods is negligible. The veracity 
of that statement, however, depends on what 
is defined as risk. A common understanding 



Mining the literature and large 
datasets 



To the editor: 

In the accelerating quest for disease 
biomarkers, the use of high-throughput 
technologies, such as DNA microarrays 
and proteomics experiments, has produced 
vast datasets identifying thousands of 
genes whose expression patterns differ in 
diseased versus normal samples. Although 
many of these differences may reach 
statistical significance, they are not always 
biologically meaningful. For example, 
reports of mRNA or protein changes of 
as little as two-fold are not uncommon, 
and although some changes of this 
magnitude turn out to be important, most 



are attributable to disease-independent 
differences between the samples. Evidence 
gleaned from other studies linking genes to 
the disease is helpful, but with such large 
datasets, a manual literature review is often 
not practical. Thus, the power of these 
emerging technologies — the ability to 
quickly generate large sets of data — has 
challenged current means of evaluating 
and validating these data. One study from 
1999, for example, reveals that a researcher 
would have to scan 130 different journals 
and read 27 papers per day to follow a 
single disease, such as breast cancer 1 . 
To address this need, my group at 
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Harvard recently developed a freely 
accessible automated literature- 
mining tool, termed MedGene, that 
comprehensively summarizes the 
relationships among over 50,000 named 
human genes (and their synonyms) and 
over 4,000 human diseases from over 
12 million records in Medline 
(http://hipseq.med.harvard.edu/MedGene). 
Several key features of this resource are 
worth noting. First, MedGene is not 
limited to any specific relationship type, 
but rather encompasses all reported 
gene-disease links, including the 
genetic, biochemical, pharmacological, 
epidemiological and physiological. Second, 
the database assigns a mathematical score 
summarizing the strength of the 
association between the disease and the 
gene, which allows semiquantitative 
analysis and organizes the genes in rank 
order. Finally, the relationships are 
identified automatically by advanced text 
searching and filtering algorithms that 
result in low 

rates of false-positive and false- negative 
linkages 2 . In one query, MedGene 
identified nearly 2,400 breast 



cancer-related genes, whereas the same 
search in four commonly used databases 
yielded a combined total of 286 genes, 
260 of which were included in the 
MedGene list 1-5 . 

A summary of all gene-disease 
relationships offers the unique opportunity 
to both evaluate and validate the outcome 
of high-throughput experiments. For 
example, we used MedGene to analyze a 
DNA microarray experiment in which over 
2,000 genes demonstrated statistically 
significant differences in expression 
between normal breast tissue and breast 
cancer. It was able to identify the subset of 
these genes previously described as breast 
cancer-related genes in the literature. To 
determine whether gene expression level 
correlated with the strength of the 
association between gene and breast 
cancer, we plotted gene expression levels 
against the breast cancer literature 
relationship scores assigned by MedGene. 
Interestingly, 

there is no correlation when considering 
expression differences as high as fivefold; 
however, a significant correlation is 
observed (r = 0.41; P - 0.05) among genes 



showing a difference of tenfold or more. 
Thus, for this experiment, expression level 
differences as high as fivefold cannot be 
attributed to the disease without 
corroborating evidence. It will be 
interesting to learn if similar results hold 
for other diseases and other experiments. 

As the search for disease biomarkers and 
drug targets comes to rely increasingly 
upon genomic-scale technologies, demand 
will grow for automated resources, such as 
MedGene, that help process the resulting 
data volume. 

Joshua LaBaer 

Institute of Proteomics, 
Harvard Medical School 
250 LongwoodAve., BCMP, 
Boston, Massachusetts 021 15, USA 
e-mail: josh@hms.harvard.edu 

1. Baasiri, R.A., Glasser, S.R., Steffen, D.L & 
Wheeler, D.A. Oncogene 18, 7958-7965 {1999). 

2. Hu, Y. et al. J. Proteome Res. 2, 405-412 (2003). 

3. Steffen, D.L., Levine, A.E., Yarus, S., Baasiri, R.A. 
& Wheeler, D.A. Bioinformatics 16, 639-649 
(2000). 

4. Bairoch, A. & Apweiler, R. Nucleic Acids Res. 28, 
45-48 (2000). 

5. Rebhan, M., Chalifa-Caspi, V., Prilusky, J. & Lancet, 
D. Trends Genet. 13, 163(1997). 



NATURE BIOTECHNOLOGY VOLUME 21 NUMBER 9 SEPTEMBER 2003 



977 



Molecular and Cellular Biology, Mar. 1999, p. 1720-1730 Vol. 19, No. 3 

0270-7306/99/S04.00+0 

Copyright © 1999, American Society for Microbiology. All Rights Reserved. 



Correlation between Protein and mRNA Abundance in Yeast 
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We have determined the relationship between mRNA and protein expression levels for selected genes 
expressed in the yeast Saccharomyces cerevisiae growing at mid-log phase. The proteins contained in total yeast 
cell lysate were separated by high-resolution two-dimensional (2D) gel electrophoresis. Over 150 protein spots 
were excised and identified by capillary liquid chroraatography-tandem mass spectrometry (LC-MS/MS). 
Protein spots were quantified by metabolic labeling and scintillation counting. Corresponding mRNA levels 
were calculated from serial analysis of gene expression (SAGE) frequency tables (V. E. Velculescu, L. Zhang, 
W. Zhou, J. Vogelstein, M. A. Basrai, D. E. Bassett, Jr., P. Hieter, B. Vogelstein, and K. W. Kinzler, Cell 
88:243-251, 1997). We found that the correlation between mRNA and protein levels was insufficient to predict 
protein expression levels from quantitative mRNA data. Indeed, for some genes, while the mRNA levels were 
of the same value the protein levels varied by more than 20-fold. Conversely, invariant steady-state levels of 
certain proteins were observed with respective mRNA transcript levels that varied by as much as 30-fold. 
Another interesting observation is that codon bias is not a predictor of either protein or mRNA levels. Our 
results clearly delineate the technical boundaries of current approaches for quantitative analysis of protein 
expression and reveal that simple deduction from mRNA transcript analysis is insufficient. 



The description of the state of a biological system by the 
quantitative measurement of the system constituents is an es- 
sential but largely unexplored area of biology. With recent 
technical advances including the development of differential 
display-PCR (21), of cDNA microarray and DNA chip tech- 
nology (20, 27), and of serial analysis of gene expression 
(SAGE) (34, 35), it is now feasible to establish global and 
quantitative mRNA expression profiles of cells and tissues in 
species for which the sequence of all the genes is known. 
However, there is emerging evidence which suggests that 
mRNA expression patterns are necessary but are by them- 
selves insufficient for the quantitative description of biological 
systems. This evidence includes discoveries of posttranscrip- 
tional mechanisms controlling the protein translation rate (15), 
the half-lives of specific proteins or mRNAs (33), and the 
intracellular location and molecular association of the protein 
products of expressed genes (32). 

Proteome analysis, defined as the analysis of the protein 
complement expressed by a genome (26), has been suggested 
as an approach to the quantitative description of the state of a 
biological system by the quantitative analysis of protein expres- 
sion profiles (36). Proteome analysis is conceptually attractive 
because of its potential to determine properties of biological 
systems that are not apparent by DNA or mRNA sequence 
analysis alone. Such properties include the quantity of protein 
expression, the subcellular location, the state of modification, 
and the association with ligands, as well as the rate of change 
with time of such properties. Jn contrast to the genomes of a 
number of microorganisms (for a review, see reference 11) and 
the transcriptome of Saccharomyces cerevisiae (35), which have 
been entirely determined, no proteome map has been com- 
pleted to date. 

The most common implementation of proteome analysis is 
the combination of two-dimensional gel electrophoresis (2DE) 
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(isoelectric focusing-sodium dodecyl sulfate [SDS]-polyacryl- 
amide gel electrophoresis) for the separation and quantitation 
of proteins with analytical methods for their identification. 
2DE permits the separation, visualization, and quantitation of 
thousands of proteins reproducibly on a single gel (18, 24). By 
itself, 2DE is strictly a descriptive technique. The combination 
of 2DE with protein analytical techniques has added the pos- 
sibility of establishing the identities of separated proteins (1, 2) 
and thus, in combination with quantitative mRNA analysis, of 
correlating quantitative protein and mRNA expression mea- 
surements of selected genes. 

The recent introduction of mass spectrometric protein anal- 
ysis techniques has dramatically enhanced the throughput and 
sensitivity of protein identification to a level which now permits 
the large-scale analysis of proteins separated by 2DE. The 
techniques have reached a level of sensitivity that permits the 
identification of essentially any protein that is detectable in the 
gels by conventional protein staining (9, 29). Current protein 
analytical technology is based on the mass spectrometric gen- 
eration of peptide fragment patterns that are idiotypic for the 
sequence of a protein. Protein identity is established by corre- 
lating such fragment patterns with sequence databases (10, 22, 
37). Sophisticated computer software (8) has automated the 
entire process such that proteins are routinely identified with 
no human interpretation of peptide fragment patterns. 

In this study, we have analyzed the mRNA and protein levels 
of a group of genes expressed in exponentially growing cells of 
the yeast S. cerevisiae. Protein expression levels were quantified 
by metabolic labeling of the yeast proteins to a steady state, 
followed by 2DE and liquid scintillation counting of the se- 
lected, separated protein species. Separated proteins were 
identified by in-gel tryptic digestion of spots with subsequent 
analysis by microspray liquid chromatography-tandem mass 
spectrometry (LC-MS/MS) and sequence database searching. 
The corresponding mRNA transcript levels were calculated 
from SAGE frequency tables (35). 

This study, for the first time, explores a quantitative com- 
parison of mRNA transcript and protein expression levels for 
a relatively large number of genes expressed in the same met- 
abolic state. The resultant correlation is insufficient for predic- 
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FIG. 1. Schematic illustration of proteome analysis by 2DE and mass spectrometry. In part 1, proteins are separated by 2DE, stained spots are excised and subjected 
to in-gel digestion with trypsin, and the resulting peptides are separated by on-line capillary high-performance liquid chromatography. In part II, a peptide is shown 
el u ting from the column in. part 1. The peptide is ionized by electrospray ionization and enters the mass spectrometer. The mass of the ionized, peptide is detected, and 
the first quadrupole mass filter allows only the specific mass-to-charge ratio of the selected peptide ion to pass into the collision cell. In the collision cell, the energized, 
ionized peptides collide with neutral argon gas molecules. Fragmentation of the peptide is essentially random but occurs mainly at the peptide bonds, resulting in smaller 
peptides of differing lengths (masses). These peptide fragments are detected as a tandem mass (MS/MS) spectrum in the third quadrupole mass filter where two ion 
series are recorded simultaneously, one each from sequencing inward from the N and C termini of the peptide, respectively. In part III, the MS/MS spectrum from the 
selected, ionized peptide is compared to predicted tandem mass spectra computer generated from a sequence database. Provided that the peptide sequence exists in 
the database, the peptide and, by association, the protein from which the peptide was derived can be identified. Unambiguous protein identification is attained in a single 
analysis because multiple peptides are identified as being derived from the same protein. 



tion of protein levels from mRNA transcript levels. We have 
also compared the relative amounts of protein and mRNA 
with the respective codon bias values for the corresponding 
genes. This comparison indicates that codon bias by itself is 
insufficient to accurately predict either the mRNA or the pro- 
tein expression levels of a gene. In addition, the results dem- 
onstrate that only highly expressed proteins are detectable by 
2DE separation of total cell lysates and that therefore the 
construction of complete proteome maps with current technol- 
ogy will be very challenging, irrespective of the type of organ- 
ism. 

MATERIALS AND METHODS 

Yeast strain and growth conditions. The source of protein and message tran- 
scripts for all experiments was YPH499 {MAT* ura3-52 Iys2-80J ade2~101 
Ieu2-Al his3-A200 trp J-&63) (30). Logarithmically growing cells were obtained by 
growing yeast cells to early log phase (3 x 10 6 cells/ml) in YPD rich medium 
(YPD supplemented with 6 mM uracil, 4.8 mM adenine, and 24 mM tryptophan) 
at 30°C (35). Metabolic labeling of protein was accomplished in YPD medium 



exactly as described elsewhere (4) with the exception that 1 ml of cells was 
labeled with 3 mCi to offset methionine present in YPD medium. Protein was 
harvested as described by G aire Is and coworkers (12). Harvested protein was 
lyophilized, . resuspended in isoelectric focusing gel rehydration solution, and 
stored at -S0°C. 

2DE. Soluble proteins were run in the first dimension by using a commercial 
flatbed electrophoresis system (Multiphor II; Pharmacia Biotech). Immobilized 
polyacrylamide gel (IPG) dry strips with nonlinear pH 3.0 to 10,0 gradients 
(Amersham-Pharmacia Biotech) were used for the first-dimension separation. 
Forty micrograms of protein from whole- cell lysates was mixed with IPG strip 
rehydration buffer (8 M urea, 2% Nonidet P-40, 10 mM dithiothreitol), and 250 
to 380 u,l of solution was added to individual lanes of an IPG strip rehydration 
tray (Amersham-Pharmacia Biotech). The strips were allowed to rehydrate at 
room temperature for 1 h. The samples were run at 300 V-10 mA-5 W for 2 h. 
then ramped to 3,500 V-10 mA-5 W over a period of 3 h, and then kept at 3,500 
V-10 mA-5 W for 15 to 19 h. At the end of the first-dimension run (60 to 70 kV * 
h), the IPG strips were reequilibrated for 8 min in 2% (wt/vol) dithiothreitol in 
2% (wt/vol) SDS-6 M urea-30% (wt/vol) glycerol-0.05 M Tris HC1 (pH 6.8) and 
for 4 min in 2.5% iodoacetamide in 2% (wt/vol) SDS-6 M urea-30% (wt/vol) 
glycerol-0.05 M Tris HC1 (pH 6.8). Following reequilibration, the strips were 
transferred and apposed to 10% polyacrylamide second-dimension gels. Poly- 
acrylamide gels were poured in a casting stand with 10% acrylamide-2.67% 
piperazine diacrylamide-0.375 M Tris base-HQ (pH 8.8)-0.1% (wt/vol) SDS-0.05% 
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FIG. 2. 2D silver-stained gel of the proteins in yeast total cell lysate. Proteins were separated in the first dimension (horizontal) by isoelectric focusing and then in 
the second dimension (vertical) by molecular weight sieving. Protein spots (156) were chosen to include the entire range of molecular weights, isoelectric focusing points, 
and staining intensities. Spots were excised, and the corresponding protein was identified by mass spectrometry and database searching. The spots are labeled on the 
gel and correspond to the data presented in Table 1. Molecular weights are given in thousands. 



(wt/vol) ammonium persulfate-0.05% TEMED (A^V'A'-tetramethylethyl- 
enediamine) in Milli-Q water. The apparatus used to run second-dimension gels 
was a noncommercial apparatus from Oxford Glycosciences, Inc. Once the IPG 
strips were apposed to the second-dimension gels, they were immediately run at 
50 mA (constant)-500 V-85 W for 20 min, followed by 200 mA (constant)-500 
V-85 W until the buffer front line was 10 to 15 mm from the bottom of the gel. 
Gels were removed and silver stained according to the procedure of Shevchenko 
et al. (29). 

Protein identification. Gels were exposed to X-ray film overnight, and then the 
silver staining and film were used to excise 156 spots of varying intensities, 
molecular weights, and isoelectric focusing points. In order to increase the 
detection limit by mass spectrometry, spots were cut out and pooled from up to 
four identical cold, silver-stained gels. In-gel tryptic digests of pooled spots were 
performed as described previously (29). Tryptic peptides were analyzed by mi- 
crocapillary LC-MS with automated switching to MS/MS mode for peptide 
fragmentation. Spectra were searched against the composite OWL protein se- 
quence database (version 30.2; 250,514 protein sequences) (24a) by using the 
computer program Sequest (8), which matches theoretical and acquired tandem 
mass spectra. A protein match was determined by comparing the number of 
peptides identified and their respective cross-correlation scores. All protein 
identifications were verified by comparison with theoretical molecular weights 
and isoelectric points. 



mRNA quantitation. Velculescu and coworkers have previously generated 
frequency tables for yeast mRNA transcripts from the same strain grown under 
the same stated conditions as described herein (35). The SAGE technology is 
based on two main principles. First, a short sequence tag (15 bp) that contains 
sufficient information uniquely to identify a transcript is generated. A single tag 
is usually generated from each mRNA transcript in the cell which corresponds to 
15 bp at the 3 '-most cutting site for Ma I II. Second, many transcript tags can be 
concatenated into a single molecule and then sequenced, revealing the identity of 
multiple tags simultaneously. Over 20,000 transcripts were sequenced from yeast 
strain YPH499 growing at mid-log phase on glucose. Assuming the previously 
derived estimate of 15,000 mRNA molecules per cell (16), this would represent 
a 1.3-fold coverage even for mRNA molecules present at a single copy per cell 
and would provide a 72% probability of detecting such transcripts. Computer 
software which took for input the gene detected, examined the nucleotide se- 
quence, and performed the calculation as described by Velculescu and coworkers 
(35) was written. In practice, we found that for 21 of 128 (16%) genes examined 
viable mRNA levels from SAGE data could not be calculated. This was because 
(i) no CATG site was found in the open reading frame (ORF), (ii) a CATG site 
was found but the corresponding 10-bp putative SAGE tag was not found in the 
frequency tables, or (iii) identical putative SAGE tags were present for multiple 
genes (e.g., TDH2_ YEAST and TDH3_ YEAST). 
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TABLE 1. Expressed genes identified from 2D gel in Fig. 2 TABLE I— Continued 



Mol wt 


Pi 


Spot no. 


YPD gene 
name 0 


Protein 
abundance 
(10 3 copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


Mol wt 


pi 


Spot no. 


YPD gene 
name 0 


Protein 
abundance 
(10 3 copies/ 
cell) 


mRNA 
abundance 
(copies/ ce 11) 


Codon 
bias 


17,259 


6.75 


133 


CPR1 


15.2 


61.7 


0.769 


39,477 


5.58 


86 


FBA1 


l /.o 


loJ.o 


u.yj5 


18,702 


4.80 


83 


EGD2 


20.1 


5.2 


0.724 


39,477 


5.58 


87 


FBA1 


427.2 


183.6 


ft nic 


18,726 


4.44 


147 


YKL056C 


61.2 


88.4 


0.831 


39,540 


6.50 


150 


HOM2 


60.3 


4.5 


ft cm 


18,978 


5.95 


135 


YER067W 


3.7 


6.7 


0.118 


39,561 


6.12 


156 


PSA1 


96.4 


27.5 


0.718 


19,308 


5.04 


130 


YLR109W 


94.4 


9.7 


0.680 


41,158 


6.01 


49 


YNL134C 


14.9 


1.5 


ft 1 1 £. 

0.3 lo 


19,681 


9.08 


136 


ATP7 


11.0 


NA^ 


0.246 


41,623 


7.18 


58 


BAT2 


1 n ft 

iy.u 


O ft 

o.y 


ft OCft 

U.25U 


20,505 


6.07 


111 


GUK1 


16.5 


3.7 


0.422 


41,728 


7.29 


110 


ERG10 


24. 1 


4.5 


0.543 


21,444 


5.25 


148 


SARI 


5.4 


10.4 


0.455 


41,900 


5.42 


74 


TOM40 


22.3 


2.2 


0.375 


21,583 


4,98 


95 


TSA1 


110.6 


40.1 


0.845 


42,402 


6.29 


45 


CYS3 


6.7 


o.y 


ft £T1 

U.ozl 


22,602 


4.30 


80 


EFB1 


66.1 


23.8 


0.875 


42,883 


5.63 


67 


DYS1 


15.8 


5.2 


0.526 


23,079 


6.29 


112 


SOD2 


12.6 


2.2 


0.351 


43,409 


6.31 


107 


SER1 


10.5 


1.5 


0.292 


23,743 - 


5.44 


137 


HSP26 


NA rf 


0.7 


0.434 


43,421 


5.59 


91 


ERG6 


2.2 


14.1 


0.408 


24,033 


5.97 


96 


ADK1 


17.4 


16.4 


0.656 


44,174 


7.32 


56 


YBR025C 


13.1 


6.0 


0.684 


24,058 


4.43 


143 


YKL117W 


29.2 


10.4 


0.339 


44,682 


4.99 


72 


T1F1 


2.9 


39.4 


0.834 


24,353 


6.30 


140 


TFS1 


8.1 


0.7 


0.146 


44,707 


7.77 


108 


PGK1 


23.7 


165.7 


0.897 


24,662 


5.85 


99 


URA5 


25.4 


6.0 


0.359 


44,707 


7.77 


109 


PGK1 


315.2 


165.7 


0.897 


24,808 


6.33 


97 


GSP1 


26.3 


5.2 


0.735 


46,080 


6.72 


30 


CAR2 


15.4 


NA C 


0.495 


24,908 


8.73 


122 


RPS5 


18.6 


NA C 


0.899 


46,383 


8.52 


53 


IDP1 


7.7 


0.7 


0.436 


25,081 


4.65 


81 


MRP8 


9.3 


NA C 


0.241 


46,553 


5.98 


47 


IDP2 


32.4 


NA C 


0.197 


25,960 


6.06 


116 


RPE1 


5.8 


0.7 


0.372 


46,679 


6.39 


50 


ENOl 


35.4 


U.7 


ri fi^ift 
u.y30 


26,378 


9.55 


127 


RPS3 


96.8 


NA C 


0.863 


46,679 


6.39 


51 


ENOl 


0.0 


n h 

U.7 


ft ftOft 

u.y3U 


26,467 


5.18 


100 


VMA4 


10.5 


3.7 


0.427 


46,679 


6.39 


52 


ENOl 


2.2 


0.7 


0.930 


26,661 


5.84 


98 


TPII 


NA rf 


NA C 


0.900 


46,773 


5.82 


63 


EN02 


15.5 


289.1 


0.960 


27,156 


5.56 


93 


PRE8 


6.9 


0.7 


0.129 


46,773 


5.82 


64 


EN02 


635.5 


289.1 


0.960 


27,334 


6.13 


115 


YHR049W 


18.4 


2.2 


0.520 


46,773 


5.82 


65 


EN02 


93.0 


289.1 


0.960 


27,472 


5.33 


92 


YNL010W 


31.6 


3.7 


0.421 


46,773 


5.82 


66 


EN02 


31.0 


289.1 


0.960 


27,480 


8.95 


123 


GPM1 


10.0 


169.4 


0.902 


47,402 


6.09 


126 


COR1 


2.5 


0.7 


0.422 


27,480 


8.95 


124 


GPM1 


231.4 


169.4 


0.902 


47,666 


8.98 


54 


AAT2 


11.7 


6.0 


0.338 


27,480 


8.95 


125 


GPM1 


7.5 


169.4 


0.902 


48,364 


5.25 


73 


WTM1 


74.5 


13.4 


0.365 


27,809 


5.97 


139 


HOR2 


5.7 


0.7 


0.381 


48,530 


6.20 


61 


MET17 


38.1 


29.0 


0.576 


27,874 


4.46 


78 


YST1 


13.6 


52.8 


0.805 


48,904 


5.18 


69 


LYS9 


16.2 


3.7 


0.463 


28,595 


4.51 


41 


PUP2 


4.4 


0.7 


0.147 


48,987 


4.90 


153 


SUP45 


29.6 


11.9 


0.377 


29,156 


6.59 


114 


YMR226C 


14.5 


2.2 


0.283 


49,727 


5.47 


70 


PR02 


13.6 


5.2 


0.297 


29,244 


8.40 


120 


DPMI 


5.0 


11.2 


0.362 


49,912 


9.27 


62 


TEF2 


558.5 


282.0 


0.932 


29,443 


5.91 


48 


PRE4 


3.4 


3.7 


0.162 


50,444 


5.67 


35 


YDR190C 


4.8 


2.2 


0.228 


30,012 


6.39 


138 


PRB1 


21.2 


1.5 


0.449 


50,837 


6.11 


32 


YEL047C 


3.8 


1.5 


0.387 


30,073 


4.63 


77 


BMH1 


14.7 


28.2 


0.454 


50,891 


4.59 


151 


TUB2 


11.2 


7.4 


0.404 


30,296 


7.94 


121 


OMP2 


67.4 


41.6 


0.499 


51,547 


6.80 


27 


LPDl 


18.9 


2.2 


0.351 


30,435 


6.34 


89 


GPP1 


70.2 


11.2 


0.703 


52,216 


7.25 


29 


SHM2 


19.7 


7.4 


0.722 


31,332 


5.57 


88 


ILV6 


13.9 


3.0 


0.402 


* 52,859 


5.54 


37 


YFR044C 


30.2 


6.7 


0.442 


32,159 


5.46 


113 


IPP1 


63.1 


3.7 


0.752 


53,798 


5.19 


71 


HXK2 


26.5 


7.4 


0.756 


32,263 


6.00 


149 


HIS1 


22.4 


4.5 


0.232 


53,803 


6.05 


145 


GYP6 


4.4 


0.7 


0.147 


33,311 


5.35 


84 


SPE3 


15.1 


6.7 


0.468 


54,403 


5.29 


39 


ALD6 


37.7 


2.2 


0.664 


34,465 


5.60 


129 


ADE1 


8.7 


5.2 


0.305 


54,403 


5.29 


40 


ALD6 


6.6 


2.2 


0.664 


34,762 


5.32 


85 


SEC 14 


10.9 


6.0 


0.373 


54,502 


6.20 


31 


ADE13 


6.3 


1.5 


0.417 


34,797 


5.85 


42 


URA1 


49.5 


8.9 


0.237 


54,543 


7.75 


25 


PYKl 


225.3 


101 .8 


0.965 


34,799 


6,04 


90 


BEL1 


103.2 


81.0 


0.875 


54,543 


7.75 


26 


PYKl 


39.8 


101.8 


0.965 


35,556 


5.97 


43 


YDL124W 


6.4 


4.5 


0.206 


55,221 


6.66 


146 


YEL071W 


16.3 


3.0 


0.244 


35,619 


8.41 


59 


TDH1 


69.8 


32.7 C 


0.940 


55,295 


4.35 


134 


PDIl 


66.2 


14.1 


0.589 


35,650 


5.49 


68 


CAR1 


5.2 


3.0 


0.339 


55,364 


5.98 


.24 


GLKl 


22.6 


6.0 


0.237 


35,712 


6.72 


117 


TDH2 


49.6 


473.0° 


0.982 


55,481 


7.97 


118 


ATPl 


21.6 


2.2 


0.637 


35,712 


6.72 


154 


TDH2 


863.5 


473.0 C 


0.982 


55,886 


6.47 


28. 


CYS4 


22.2 


NA C 


0.444 


35,712 


6.72 


155 


TDH2 


79.4 


473.0° 


0.982 


56,167 


5.83 


33 


AR08 


14.3 


3.0 


0.324 


36,272 


4.85 


128 


APA1 


8.7 


0.7 


0.425 


56,167 


5.83 


34 


AR08 


9.1 


3.0 


0.324 


36,358 


5.05 


75 


YJR105W 


17.6 


17.1 


0.522 


56,584 


6.36 


20 


CYB2 


18.9 


NA C 


0.259 


36,358 


5.05 


76 


■\r Tf) i f»C 117 

YJR105W 


27.5 


17.1 


ft cn 
0.522 




5.53 


OU 




Z.J 


n 7 
u. / 


U.^O J 


36,596 


6.37 


79 


ADH2 


58.9 


260.0° 


0.711 


57,383 


5.98 


144 


ZWFl 


5.6 


0.7 


0.215 


36,714 


6.30 


102 


ADH1 


746.1 


260.0 


0.913 


57,464 


5.49 


36 


THR4 


21.4 


3.7 


0.508 


36,714 


6.30 


103 


ADH1 


17.6 


260.0 


0.913 


57,512 


5.50 


7 


SRV2 


6.5 


NA C 


0.260 


36,714 


6.30 


104 


ADH1 


61.4 


260.0 


0.913 


57,727 


4.92 


152 


VMA2 


33.7 


8.9 


0.546 


36,714 


6.30 


105 


ADH1 


52.7 


260.0 


0.913 


58,573 


6.47 


17 


ACHl 


4.4 


1.5 


0.327 


37,033 


6.23 


44 


TALI 


44.8 


3.7 


0.701 


58,573 


6.47 


18 


ACHl 


5.4 


1.5 


0.327 


37,796 


7.36 


57 


IDH2 


29.4 


6.7 


0.330 


61,353 


5.87 


21 


PDCl 


6.5 


200.7 


0.962 


37,886 


6.49 


106 


1LV5 


76.0 


4.5 


0.892 


61,353 


5.87 


22 


PDCl 


303.2 


200.7 


0.962 


38,700 


7.83 


55 


BAT1 


30.9 


11.2 


0.469 


61,353 


5.87 


23 


PDCl 


16.3 


200.7 


0.962 


38,702 


6.24 


46 


QCR2 




2.2 


0.326 


61,649 


5.54 


38 


CCT8 


2.2 


1.5 


0.271 
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TABLE 1— Continued 



Mol wt 


Pi 


Spot no. 


YPD gene 
name" 


Prntpin 

1 1 Ult 111 

abundance 
(10 3 copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 




61,902 


6.21 


101 


PDC5 


4.3 


NA C 


0.828 


62,266 


6.19 


16 


1CL1 


20.1 


NA C 


0.327 


62,862 


8.02 


19 


1LV3 


5.3 


4.5 


0.548 


63,082 


6.40 


119 


PGM2 


2.2 


3.0 


0.402 


64,335 


5.77 


5 


PAB1 


30.4 


1.5 


0.616 


66,120 


5.42 


8 


STI1 


6.7 


0.7 


0.313 


66,120 


5.42 


9 


STI1 


6.4 


0.7 


0.313 


66,450 


5.29 


141 


SSB2 


7.0 


NA C 


0.880 


66,450 


5.29 


142 


SSB2 


2.3 


NA C 


0.880 


66,456 


5.23 


10 


SSB1 


64.5 


. 79.5 


0.907 


66,456 


5.23 


11 


SSB1 


59.0 
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a YPD gene names are available from the YPD website (39). 
b NA, calculation could not be performed or was not available. 
c mRNA data inconclusive or NA. 

d No methionines in predicted ORF; therefore, protein concentration was not 
determined. 

e Measured molecular weight or pi did not match theoretical molecular weight 
or pi. 



Protein quantitation. [ 35 S]methionine-labeled gels were exposed to X-ray film 
overnight, and then the silver stain and film were used to excise 156 spots of 
varying intensities, molecular weights, and pis. The excised spots were placed in 
0.6-ml microcentrifuge tubes, and scintillation cocktail (100 was added. The 
samples were vortexed and counted. In addition, two parallel gels were electro- 
blotted to polyvinyl idene difluoride membranes. The membranes were exposed 
to X-ray film, and four intense single spots were excised from each membrane 
and subjected to amino acid analysis. For these four spots, a mean of 209 ± 4 
cpm/pmol of protein/methionine was found. This number was used to quantitate 
all remaining spots in conjunction with the number of methionines present in the 
protein. 

To ensure that proteins were labeled to equilibrium, parallel 2D gels were 
prepared and run on yeast metabolically labeled for 1, 2, 6, or 18 h. The 
corresponding 156 spots were excised from each gel, and radioactivity was mea- 
sured by liquid scintillation counting for each spot. Calculated protein levels were 
highly reproducible for all time points measured after 1 h. 

Calculation of codon bias and predicted half-life. Codon bias values were 
extracted from the YPD spreadsheet (17). Protein half-lives were calculated 
based on the N-end rule (33). When the N-terminal processing was not known 
experimentally, it was predicted based on the affinity of methionine aminopep- 
tidase (31). 

RESULTS 

Characteristics of proteome approach. Nearly every facet of 
proteome analysis hinges on the unambiguous identification of 
large numbers of expressed proteins in cells. Several tech- 
niques have been described previously for the identification of 
proteins separated by 2DE, including N-terminal and internal 
sequencing (1, 2), amino acid analysis (38), and more recently 
mass spectrometry (25). We utilized techniques based on mass 
spectrometry because they afford the highest levels of sensitiv- 
ity and provide unambiguous identification. The specific pro- 
cedure used is schematically illustrated in Fig. 1 and is based 
on three principles. First, proteins are removed from the gel by 



proteolytic in-gel digestion, and the resulting peptides are sep- 
arated by on-line capillary high-performance liquid chromatog- 
raphy. Second, the eluting peptides are ionized and detected, and 
the specific peptide ions are selected and fragmented by the 
mass spectrometer. To achieve this, the mass spectrometer 
switches between the MS mode (for peptide mass identifica- 
tion) and the MS/MS mode (for peptide characterization and 
sequencing). Selected peptides are fragmented by a process 
called collision-induced dissociation (CID) to generate a tan- 
dem mass spectrum (MS/MS spectrum) that contains the pep- 
tide sequence information. Third, individual CID mass spectra 
are then compared by computer algorithms to predicted spec- 
tra from a sequence database. This results in the identification 
of the peptide and, by association, the protein(s) in the spot. 
Unambiguous protein identification is attained in a single anal- 
ysis by the detection of multiple peptides derived from the 
same protein. 

Protein identification. Yeast total cell protein lysate (40 |xg), 
metabolically labeled with [ 35 S] methionine, was electro- 
phoretically separated by isoelectric focusing in the first dimen- 
sion and by SDS-10% polyacrylamide gel electrophoresis in 
the second dimension. Proteins were visualized by silver stain- 
ing and by autoradiography. Of the more than 1,000 proteins 
visible by silver staining, 156 spots were excised from the gel 
and subjected to in-gel tryptic digestion, and the resulting 
peptides were analyzed and identified by microspray LC- 
MS/MS techniques as described above. The proteins in this 
study were all identified automatically by computer software 
with no human interpretation of mass spectra. They are indi- 
cated in Fig. 2 and detailed in Table 1. 

The CID spectra shown in Fig. 3 indicate that the quality of 
the identification data generated was suitable for unambiguous 
protein identification. The spectra represent the amino acid 
sequences of tryptic peptides NSGDIVNLGSIAGR (Fig. 3A) 
and FAVGAFTDSLR (Fig. 3B). Both peptides were derived 
from protein S57593 (hypothetical protein YMR226C), which 
migrated to spot 114 (molecular weight, 29,156; pi, 6.59) in the 
2D gel in Fig. 2. Five other peptides from the same analysis 
were also computer matched to the same protein sequence. 

Protein and mRNA quantitation. For the 156 genes investi- 
gated, the protein expression levels ranged from 2,200 (PGM2) 
to 863,000 (TDH2/TDH3) copies/cell. The levels of mRNA for 
each of the genes identified were calculated from SAGE fre- 
quency tables (35). These tables contain the mRNA levels for 
4,665 genes in yeast strain YPH499 grown to mid-log phase in 
YPD medium on glucose as a carbon source. In some in- 
stances, the mRNA levels could not be calculated for reasons 
stated in Materials and Methods. For the proteins analyzed in 
this study, mean transcript levels varied from 0.7 to 473 copies/ 
cell. 

Selection of the sample population for mRNA-protein ex- 
pression level correlation. The protein spots selected for iden- 
tification were selected from spots visible by silver staining in 
the 2D gel. An attempt was made not to include spots where 
overlap with other spots was readily apparent. The number of 
proteins identified was 156 (Table 1). Some proteins migrated 
to more than one spot (presumably due to differential protein 
processing or modifications), and protein levels from these 
spots were calculated by integrating the intensities of the dif- 
ferent spots. The 156 protein spots analyzed represented the 
products of 128 different genes. Genes were excluded from the 
correlation analysis only if part of the data set was'missing; i.e., 
genes were excluded if (i) no mRNA expression data were 
available for the protein or putative SAGE tags were ambig- 
uous, (ii) the amino acid sequence did not contain methionine, 
(iii) more than a single protein was conclusively identified as 
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matched to the same protein. 



migrating to the same gel spot, or (iv) the theoretical and 
observed pis and molecular weights could not be reconciled. 
After these criteria were applied, the number of genes used in 
the correlation analysis was 106. 



Codon bias and predicted half-lives. Codon bias is thought 
to be an indicator of protein expression, with highly expressed 
proteins having large codon bias values. The codon bias distri- 
bution for the entire set of more than 6,000 predicted yeast 
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gene ORFs is presented in Fig. 4A. The interval with the 
largest frequency of genes is between the codon bias values of 
0.0 and 0.1. This segment contains more than 2,500 genes. The 
distribution of the codon bias values of the 128 different genes 
found in this study (all protein spots from Fig. 2) is shown in 
Fig. 4B, and protein half-lives (predicted from applying the 
N-end rule [33] to the experimentally determined or predicted 
protein N termini) are shown in Fig. 4C. No genes were iden- 
tified with codon bias values less than 0.1 even though thou- 
sands of genes exist in this category. In addition, nearly all of 
the proteins identified had long predicted half-lives (greater 
than 30 h). 

Correlation of mRNA and protein expression levels. The 

correlation between mRNA and protein levels of the genes 
selected as described above is shown in Fig. 5. For the entire 
group (106 genes) for which a complete data set was gener- 
ated, there was a general trend of increased protein levels 
resulting from increased mRNA levels. The Pearson product 
moment correlation coefficient for the whole data set (106 
genes) was 0.935. This number is highly biased by a small 
number of genes with very large protein and message levels. A 
more representative subset of the data is shown in the inset of 
Fig. 5. It shows genes for which the message level was below 10 
copies/cell and includes 69% (73 of 106 genes) of the data used 
in the study. The Pearson product moment correlation coeffi- 
cient for this data set was only 0.356. We also found that levels 
of protein expression coded for by mRNA with comparable 
abundance varied by as much as 30-fold and that the mRNA 
levels coding for proteins with comparable expression levels 
varied by as much as 20-fold. 

The distortion of the correlation value induced by the un- 
even distribution of the data points along the x axis is further 
demonstrated by the analysis in Fig. 6. The 106 samples in- 
cluded in the study were ranked by protein abundance, and the 
Pearson product moment correlation coefficient was repeat- 
edly calculated after including progressively more, and higher- 
abundance, proteins in each calculation. The correlation values 
remained relatively stable in the range of 0.1 to 0.4 if the 
lowest-expressed 40 to 95 proteins used in this study were 
included. However, the correlation value steadily climbed by 
the inclusion of each of the 11 very highly expressed proteins. 

Correlation of protein and mRNA expression levels with 
codon bias. Codon bias is the propensity for a gene to utilize 
the same codon to encode an amino acid even though other 
codons would insert the identical amino acid in the growing 
polypeptide sequence. It is further thought that highly ex- 
pressed proteins have large codon biases (3). To assess the 
value of codon bias for predicting mRNA and protein levels in 
exponentially growing yeast cells, we plotted the two experi- 
mental sets of data versus the codon bias (Fig, 7). The distri- 
bution patterns for both mRNA and protein levels with respect 
to codon bias were highly similar. There was high variability in 
the data within the codon bias range of 0.8 to 1.0. Although a 
large codon bias generally resulted in higher protein and mes- 
sage expression levels, codon bias did not appear to be predic- 
tive of either protein levels or mRNA levels in the cell. 

DISCUSSION 

The desired end point for the description of a biological 
system is not the analysis of mRNA transcript levels alone but 
also the accurate measurement of protein expression levels and 
their respective activities. Quantitative analysis of global 
mRNA levels currently is a preferred method for the analysis 
of the state of cells and tissues (11). Several methods which 
either provide absolute mRNA abundance (34, 35) or relative 
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study. (C) Distribution of identified proteins in this study based on predicted 
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mRNA levels in comparative analyses (20, 27) have been de- 
scribed elsewhere. The techniques are fast and exquisitely sen- 
sitive and can provide mRNA abundance for potentially any 
expressed gene. Measured mRNA levels are often implicitly or 
explicitly extrapolated to indicate the levels of activity of the 
corresponding protein in the cell. Quantitative analysis of pro- 
tein expression levels (proteome analysis) is much more time- 
consuming because proteins are analyzed sequentially one by 
one and is not general because analyses are limited to the 
relatively highly expressed proteins. Proteome analysis does, 
however, provide types of data that are of critical importance 
for the description of the state of a biological system and that 
are not readily apparent from the sequence and the level of 
expression of the mRNA transcript. This study attempts to 
examine the relationship between mRNA and protein expres- 
sion levels for a large number of expressed genes in cells 
representing the same state. 

Limits in the sensitivity of current protein analysis technol- 
ogy precluded a completely random sampling of yeast proteins. 
We therefore based the study on those proteins visible by silver 
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staining on a 2D gel. Of the more than 1,000 visible spots. 156 
were chosen to include the entire range of molecular weights, 
isoelectric focusing points, and staining intensities displayed on 
the 2D protein pattern. The genes identified in this study 
shared a number of properties. First, all of the proteins in this 
study had a codon bias of greater than 0.1 and 93% were 
greater than 0.2 (Fig. 4B). Second, with few exceptions, the 
proteins in this study had long predicted half-lives according to 
the N-end rule (Fig. 4C). Third, low-abundance proteins with 
regulatory functions such as transcription factors or protein 
kinases were not identified. 

Because the population of proteins used in this study ap- 
pears to be fairly homogeneous with respect to predicted half- 
life and codon bias, it might be expected that the correlation of 
the mRNA and protein expression levels would be stronger for 
this population than for a random sample of yeast proteins. We 
tested this assumption by evaluating the correlation value if 
different subsets of the available data were included in the 
calculation. The 106 proteins were ranked from lowest to high- 
est protein expression level, and the trend in the correlation 
value was evaluated by progressively including more of the 
higher-abundance proteins in the calculation (Fig. 6). The cor- 
relation value when only the lower-abundance 40 to 93 pro- 
teins were examined was consistently between 0.1 and 0.4. If 
the 11 most abundant proteins were included, the correlation 
steadily increased to 0.94. We therefore expect that the corre- 
lation for all yeast proteins or for a random selection would be 
less than 0.4. The observed level of correlation between 
mRNA and protein expression levels suggests the importance 



of posttranslational mechanisms controlling gene expression. 
Such mechanisms include translational control (15) and con- 
trol of protein half-life (33). Since these mechanisms are also 
active in higher eukaryotic cells, we speculate that there is no 
predictive correlation between steady-state levels of mRNA 
and those of protein in mammalian cells. 

Like other large-scale analyses, the present study has several 
potential sources of error related to the methods used to de- 
termine mRNA and protein expression levels. The mRNA 
levels were calculated from frequency tables of SAGE data. 
This method is highly quantitative because it is based on actual 
sequencing of unique tags from each gene, and the number of 
times that a tag is represented is proportional to the number of 
mRNA molecules for a specific gene. This method has some 
limitations including the following: (i) the magnitude of the 
error in the measurement of mRNA levels is inversely propor- 
tional to the mRNA levels, (ii) SAGE tags from highly similar 
genes may not be distinguished and therefore are summed, (iii) 
some SAGE tags are from sequences in the 3' untranslated 
region of the transcript, (iv) incomplete cleavage at the SAGE 
tag site by the restriction enzyme can result in two tags repre- 
senting one mRNA, and (v) some transcripts actually do not 
generate a SAGE tag (34, 35). 

For the SAGE method, the error associated with a value 
increases with a decreasing number of transcripts per cell. The 
conclusions drawn from this study are dependent on the qual- 
ity of the mRNA levels from previously published data (35). 
Since more than 65% of the mRNA levels included in this 
study were calculated to 10 copies/cell or less (40% were less 
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than 4 copies/cell), the error associated with these values may 
be quite large. The mRNA levels were calculated from more 
than 20,000 transcripts. Assuming that the estimate of 15,000 
mRNA molecules per cell is correct (16), this would mean that 
mRNA transcripts present at only a single copy per cell would 
be detected 72% of the time (35). The mRNA levels for each 
gene were carefully scrutinized, and only mRNA levels for 
which a high degree of confidence existed were included in the 
correlation value. 

Protein abundance was determined by metabolic radiolabel- 
ing with [ 35 S]methionine. The calculation required knowledge 
of three variables: the number of methionines in the mature 
protein, the radioactivity contained in the protein, and the 
specific activity of the radiolabel normalized per methionine. 
The number of methionines per protein was determined from 
the amino acid sequence of the proteins identified by tandem 
mass spectrometry. For some proteins, it was not known 
whether the methionine of the nascent polypeptide was pro- 
cessed away. The N termini of those proteins were predicted 
based on the specificity of methionine aminopeptidase (31). If 
the N-terminal processing did not conform to the predicted 
specificity of processing enzymes, the calculation of the num- 
ber of methionines would be affected. This discrepancy would 
affect most the quantitation of a protein with a very low num- 
ber of methionines. The average number of calculated methi- 
onines per protein in this study was 7.2. We therefore expect 
the potential for erroneous protein quantitation due to un- 
usual N-terminal processing to be small. 



The amount of radioactivity contained in a single spot might 
be the sum of the radioactivity of comigrating proteins. Be- 
cause protein identification was based on tandem mass spec- 
trometric techniques, comigrating proteins could be identified. 
However, comigrating proteins were rarely detected in this 
study, most likely because relatively small amounts of total 
protein (40 u,g) were initially loaded onto the gels, which re- 
sulted in highly focused spots containing generally 1 to 25 ng of 
protein. Because of the relatively small amount loaded, the 
concentrations of any potentially comigrating protein would 
likely be below the limit of detection of the mass spectrometry 
technique used in this study (1 to 5 ng) and below the limit of 
visualization by silver staining (1 to 5 ng). In the overwhelming 
majority of the samples analyzed, numerous peptides from a 
single protein, were detected. It is assumed that any comigrat- 
ing proteins were at levels too low to be detected and that their 
influence in the calculation would be small. 

The specific activity of the radiolabel was determined by 
relating the precise amount of protein present in selected spots 
of a parallel gel, as determined by quantitative amino acid 
composition analysis, to the number of methionines present in 
the sequence of those proteins and the radioactivity deter- 
mined by liquid scintillation counting. It is possible that the 
resulting number might be influenced by unavoidable losses 
inherent in the amino acid analysis procedure applied. Because 
four different proteins were utilized in the calculation and the 
experiment was done in duplicate, the specific activity calcu- 
lated is thought to be highly accurate. Indeed, the specific 
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activities calculated for each of the four proteins varied by less 
than 10%. Any inconsistencies in the calculation of the specific 
activity would result in differences in the absolute levels calcu- 
lated but not in the relative numbers and would therefore not 
influence the correlation value determined. 

The protein quantitative method used eliminates a number 
of potential errors inherent in previous methods for the quan- 
titation of proteins separated by 2DE, such as preferential 
protein staining and bias caused by inequalities in the number 
of radiolabeled residues per protein. Any 2D gel-based method 
of quantitation is complicated by the fact that in some cases the 
translation products of the same mRNA migrated to different 
spots. One major reason is posttranslational modification or 
processing of the protein. Also, artifactual proteolysis during 
cell lysis and sample preparation can lead to multiple resolved 
forms of the protein. In such cases, the protein levels of spots 
coded for by the same mRNA were pooled. In addition, the 
existence of other spots coded for by the same mRNA that 
were not analyzed by mass spectrometry or that were below the 
limit of detection for silver staining cannot be ruled out. How- 
ever, since this study is based on a class of highly expressed 
proteins, the presence of undetected minor spots below silver 
staining sensitivity corresponding to a protein analyzed in the 
study would generally cause a relatively small error in protein 
quantitation. 

Codon bias is a measure of the propensity of an organism to 
selectively utilize certain codons which result in the incorpo- 
ration of the same amino acid residue in a growing polypeptide 
chain. There are 61 possible codons that code for 20 amino 
acids. The larger the codon bias value, the smaller the number 
of codons that are used to encode the protein (19). It is 



thought that codon bias is a measure of protein abundance 
because highly expressed proteins generally have large codon 
bias values (3, 13). 

Nearly all of the most highly expressed proteins had codon 
bias values of greater than 0.8. However, we detected a number 
of genes with high codon bias and relative low protein abun- 
dance (Fig. 7). For example, the expressed gene with both the 
second largest protein and mRNA levels in the study was 
EN02_ YEAST (775,000 and 289.1 copies/cell, respectively). 
ENO!_YEAST was also present in the gel at much lower 
protein and mRNA levels (44,200 and 0.7 copies/cell, respec- 
tively). The codon bias values for EN02 and ENOl are similar 
(0.96 and 0:93, respectively), but the expression of the two 
genes is differentially regulated. Specifically, EN01_YEAST is 
glucose repressed (6) and was therefore present in low abun- 
dance under the conditions used. Other genes with large codon 
bias values that were not of high protein abundance in the gel 
include EFT1, TIF1, HXK2, GSP1, EGD2, SHM2, and TALI. 
We conclude that merely determining the codon bias of a gene 
is not sufficient to predict its protein expression level. 

Interestingly, codon bias appears to be an excellent indicator 
of the boundaries of current 2D gel proteome analysis tech- 
nology. There are thousands of genes with expressed mRNA 
and likely expressed protein with codon bias values less than , 
0.1 (Fig. 4A). In this study, we detected none of them, and only / 
a very small percentage of the genes detected in this study had 
codon bias values between 0.1 and 0.2 (Fig. 4B). Indeed, in 
every examined yeast proteome study (5, 7, 13, 28) where the 
combined total number of identified proteins is 300 to 400, this 
same observation is true. It is expected that for the more 
complex cells of higher eukaryotic organisms the detection of 
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low-abundance proteins would be even more challenging than 
for yeast. This indicates that highly abundant, long-lived pro- 
teins are overwhelmingly detected in proteome studies. If pro- 
teome analysis is to provide truly meaningful information 
about cellular processes, it must be able to penetrate to the 
level of regulatory proteins, including transcription factors and 
protein kinases. A promising approach is the use of narrow- 
range focusing gels with immobilized pH gradients (IPG) (23). 
This would allow for the loading of significantly more protein 
per pH unit covered and also provide increased resolution of 
proteins with similar electrophoretic mobilities. A standard pH 
gradient in an isoelectric focusing gel covers a 7-pH-unit range 
(pH 3 to 10) over 18 cm. A narrow-range focusing gel might 
expand the range to 0.5 pH units over 18 cm or more. This 
could potentially increase by more than 10-fold the number of 
proteins that can be detected. Clearly, current proteome tech- 
nology is incapable of analyzing low-abundance regulatory pro- 
teins without employing an enrichment method for relatively 
low-abundance proteins. In conclusion, this study examined 
the relationship between yeast protein and message levels and 
revealed that transcript levels provide little predictive value 
with respect to the extent of protein expression. 
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